This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.3.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.3.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.3.6-rc1
Dave Jiang dave.jiang@intel.com libnvdimm: prevent nvdimm from requesting key when security is disabled
Gao Xiang gaoxiang25@huawei.com staging: erofs: detect potential multiref due to corrupted images
Gao Xiang gaoxiang25@huawei.com staging: erofs: avoid endless loop of invalid lookback distance 0
Gao Xiang gaoxiang25@huawei.com staging: erofs: add two missing erofs_workgroup_put for corrupted images
Gao Xiang gaoxiang25@huawei.com staging: erofs: some compressed cluster should be submitted for corrupted images
Gao Xiang gaoxiang25@huawei.com staging: erofs: fix an error handling in erofs_readdir()
Andrew Murray andrew.murray@arm.com coresight: etm4x: Use explicit barriers on enable/disable
Eric Sandeen sandeen@redhat.com vfs: Fix EOVERFLOW testing in put_compat_statfs64
Vincent Chen vincent.chen@sifive.com riscv: Avoid interrupts being erroneously enabled in handle_exception()
Srikar Dronamraju srikar@linux.vnet.ibm.com perf stat: Reset previous counts on repeat with interval
Balasubramani Vivekanandan balasubramani_vivekanandan@mentor.com tick: broadcast-hrtimer: Fix a race in bc_set_next
Sean Christopherson sean.j.christopherson@intel.com KVM: nVMX: Fix consistency check on injected exception error code
Filipe Manana fdmanana@suse.com Btrfs: fix selftests failure due to uninitialized i_mode in test inodes
Hans de Goede hdegoede@redhat.com drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed
Navid Emamdoost navid.emamdoost@gmail.com nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
Danielle Ratson danieller@mellanox.com mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
Arnaldo Carvalho de Melo acme@redhat.com perf unwind: Fix libunwind build failure on i386 systems
Lee Jones lee.jones@linaro.org i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630
Marek Vasut marex@denx.de net: dsa: microchip: Always set regmap stride to 1
Allan Zhang allanzhang@google.com bpf: Fix bpf_event_output re-entry issue
Ming Lei ming.lei@redhat.com blk-mq: move lockdep_assert_held() into elevator_exit
Andrii Nakryiko andriin@fb.com libbpf: fix false uninitialized variable warning
Valdis Kletnieks valdis.kletnieks@vt.edu kernel/elfcore.c: include proper prototypes
Andrii Nakryiko andriin@fb.com selftests/bpf: adjust strobemeta loop to satisfy latest clang
Qian Cai cai@lca.pw include/trace/events/writeback.h: fix -Wstringop-truncation warnings
Thomas Richter tmricht@linux.ibm.com perf build: Add detection of java-11-openjdk-devel package
KeMeng Shi shikemeng@huawei.com sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
Mathieu Desnoyers mathieu.desnoyers@efficios.com sched/membarrier: Fix private expedited registration check
Mathieu Desnoyers mathieu.desnoyers@efficios.com sched/membarrier: Call sync_core only before usermode for same mm
Nathan Chancellor natechancellor@gmail.com libnvdimm/nfit_test: Fix acpi_handle redefinition
zhengbin zhengbin13@huawei.com fuse: fix memleak in cuse_channel_open
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com libnvdimm: Fix endian conversion issues
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com libnvdimm/region: Initialize bad block for volatile namespaces
Andrei Dulea adulea@amazon.de iommu/amd: Fix downgrading default page-sizes in alloc_pte()
Stefan Mavrodiev stefan@olimex.com thermal_hwmon: Sanitize thermal_zone type
Ido Schimmel idosch@mellanox.com thermal: Fix use-after-free when unregistering thermal zone device
Sanjay R Mehta sanju.mehta@amd.com ntb: point to right memory window index
Arvind Sankar nivedita@alum.mit.edu x86/purgatory: Disable the stackleak GCC plugin for the purgatory
Tycho Andersen tycho@tycho.ws selftests/seccomp: fix build on older kernels
Fabrice Gasnier fabrice.gasnier@st.com pwm: stm32-lp: Add check in case requested period cannot be achieved
Trond Myklebust trondmy@gmail.com SUNRPC: Don't try to parse incomplete RPC messages
Trond Myklebust trondmy@gmail.com pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
Masami Hiramatsu mhiramat@kernel.org perf probe: Fix to clear tev->nargs in clear_probe_trace_event()
Trek trek00@inbox.ru drm/amdgpu: Check for valid number of registers to read
Felix Kuehling Felix.Kuehling@amd.com drm/amdgpu: Fix KFD-related kernel oops on Hawaii
Florian Westphal fw@strlen.de netfilter: nf_tables: allow lookups in dynamic sets
Ryan Chen ryan_chen@aspeedtech.com watchdog: aspeed: Add support for AST2600
Trond Myklebust trondmy@gmail.com SUNRPC: RPC level errors should always set task->tk_rpc_status
Erqi Chen chenerqi@gmail.com ceph: reconnect connection if session hang in opening state
Jeff Layton jlayton@kernel.org ceph: fetch cap_gen under spinlock in ceph_add_cap
Luis Henriques lhenriques@suse.com ceph: fix directories inode i_blkbits initialization
Miklos Szeredi mszeredi@redhat.com fuse: fix request limit
Igor Druzhinin igor.druzhinin@citrix.com xen/pci: reserve MCFG areas earlier
Chengguang Xu cgxu519@zoho.com.cn 9p: avoid attaching writeback_fid on mmap with type PRIVATE
Lu Shuaibing shuaibinglu@126.com 9p: Transport error uninitialized
Chuck Lever chuck.lever@oracle.com xprtrdma: Send Queue size grows after a reconnect
Chuck Lever chuck.lever@oracle.com xprtrdma: Toggle XPRT_CONGESTED in xprtrdma's slot methods
Jia-Ju Bai baijiaju1990@gmail.com fs: nfs: Fix possible null-pointer dereferences in encode_attrs()
Sascha Hauer s.hauer@pengutronix.de ima: fix freeing ongoing ahash_request
Sascha Hauer s.hauer@pengutronix.de ima: always return negative code for error
Srinivas Kandagatla srinivas.kandagatla@linaro.org drivers: thermal: qcom: tsens: Fix memory leak from qfprom read
Johannes Berg johannes.berg@intel.com cfg80211: initialize on-stack chandefs
Johannes Berg johannes.berg@intel.com cfg80211: validate SSID/MBSSID element ordering assumption
Johannes Berg johannes.berg@intel.com nl80211: validate beacon head
Johan Hovold johan@kernel.org ieee802154: atusb: fix use-after-free at disconnect
Juergen Gross jgross@suse.com xen/xenbus: fix self-deadlock after killing user process
David Hildenbrand david@redhat.com xen/balloon: Set pages PageOffline() in balloon_add_region()
H. Nikolaus Schaller hns@goldelico.com DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again
Seth Forshee seth.forshee@canonical.com sched: Add __ASSEMBLY__ guards around struct clone_args
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com libnvdimm/altmap: Track namespace boundaries in altmap
Wanpeng Li wanpengli@tencent.com Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
Adrian Hunter adrian.hunter@intel.com mmc: sdhci: Let drivers define their DMA mask
Russell King rmk+kernel@armlinux.org.uk mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
Russell King rmk+kernel@armlinux.org.uk mmc: sdhci: improve ADMA error reporting
Nicolin Chen nicoleotsuka@gmail.com mmc: tegra: Implement ->set_dma_mask()
Johannes Berg johannes.berg@intel.com mac80211: keep BHs disabled while calling drv_tx_wake_queue()
Xiaolin Zhang xiaolin.zhang@intel.com drm/i915: to make vgpu ppgtt notificaiton as atomic operation
Chris Wilson chris@chris-wilson.co.uk drm/i915/userptr: Acquire the page lock around set_page_dirty()
Xiaolin Zhang xiaolin.zhang@intel.com drm/i915/gvt: update vgpu workload head pointer correctly
Kevin Wang kevin1.wang@amd.com drm/amd/powerplay: change metrics update period from 1ms to 100ms
Lyude Paul lyude@redhat.com drm/nouveau/kms/nv50-: Don't create MSTMs for eDP connectors
Sean Paul seanpaul@chromium.org drm/msm/dsi: Fix return value check for clk_get_parent
Tomi Valkeinen tomi.valkeinen@ti.com drm/omap: fix max fclk divider for omap36xx
Anders Roxell anders.roxell@linaro.org drm: mali-dp: Mark expected switch fall-through
Daniel Vetter daniel.vetter@ffwll.ch drm/atomic: Take the atomic toys away from X
Daniel Vetter daniel.vetter@ffwll.ch drm/atomic: Reject FLIP_ASYNC unconditionally
Maarten Lankhorst maarten.lankhorst@linux.intel.com drm/i915/dp: Fix dsc bpp calculations, v5.
Srikar Dronamraju srikar@linux.vnet.ibm.com perf stat: Fix a segmentation fault when using repeat forever
Jiri Olsa jolsa@kernel.org perf tools: Fix segfault in cpu_cache_level__read()
Rasmus Villemoes linux@rasmusvillemoes.dk watchdog: imx2_wdt: fix min() calculation in imx2_wdt_set_timeout
Shuah Khan skhan@linuxfoundation.org selftests: pidfd: Fix undefined reference to pthread_create()
Jarkko Sakkinen jarkko.sakkinen@linux.intel.com selftests/tpm2: Add the missing TEST_FILES assignment
Sumit Saxena sumit.saxena@broadcom.com PCI: Restore Resizable BAR size bits correctly for 1MB BARs
Jon Derrick jonathan.derrick@intel.com PCI: vmd: Fix shadow offsets to reflect spec changes
Dexuan Cui decui@microsoft.com PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it
Jon Derrick jonathan.derrick@intel.com PCI: vmd: Fix config addressing when using bus offsets
Li RongQing lirongqing@baidu.com timer: Read jiffies once when forwarding base clk
Kees Cook keescook@chromium.org usercopy: Avoid HIGHMEM pfn warning
Tom Zanussi zanussi@kernel.org tracing: Make sure variable reference alias has correct var_ref_idx
Michael Nosthoff committed@heine.so power: supply: sbs-battery: only return health when battery present
Michael Nosthoff committed@heine.so power: supply: sbs-battery: use correct flags field
Jiaxun Yang jiaxun.yang@flygoat.com MIPS: Treat Loongson Extensions as ASEs
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - use the full crypt length value
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - account for TEE not ready to report
Horia Geantă horia.geanta@nxp.com crypto: caam - fix concurrency issue in givencrypt descriptor
Horia Geantă horia.geanta@nxp.com crypto: caam/qi - fix error handling in ERN handler
Wei Yongjun weiyongjun1@huawei.com crypto: cavium/zip - Add missing single_release()
Herbert Xu herbert@gondor.apana.org.au crypto: skcipher - Unmap pages after an external error
Alexander Sverdlin alexander.sverdlin@nokia.com crypto: qat - Silence smp_processor_id() warning
Steven Rostedt (VMware) rostedt@goodmis.org tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure
Steven Rostedt (VMware) rostedt@goodmis.org tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file
Marc Kleine-Budde mkl@pengutronix.de can: mcp251x: mcp251x_hw_reset(): allow more time after a reset
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9
Christophe Leroy christophe.leroy@c-s.fr powerpc/mm: Fix an Oops in kasan_mmu_init()
Christophe Leroy christophe.leroy@c-s.fr powerpc/mm: Add a helper to select PAGE_KERNEL_RO or PAGE_READONLY
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions
Christophe Leroy christophe.leroy@c-s.fr powerpc/kasan: Fix shadow area set up for modules.
Christophe Leroy christophe.leroy@c-s.fr powerpc/kasan: Fix parallel loading of modules.
Alexey Kardashevskiy aik@ozlabs.ru powerpc/powernv/ioda: Fix race in TCE level allocation
Gautham R. Shenoy ego@linux.vnet.ibm.com powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
Andrew Donnellan ajd@linux.ibm.com powerpc/powernv: Restrict OPAL symbol map to only be readable by root
Christophe Leroy christophe.leroy@c-s.fr powerpc/ptdump: Fix addresses display on PPC32
Christophe Leroy christophe.leroy@c-s.fr powerpc/32s: Fix boot failure with DEBUG_PAGEALLOC without KASAN.
Christophe Leroy christophe.leroy@c-s.fr powerpc/603: Fix handling of the DIRTY flag
Santosh Sivaraj santosh@fossix.org powerpc/mce: Schedule work from irq_work
Balbir Singh bsingharora@gmail.com powerpc/mce: Fix MCE handling for huge pages
Paul Mackerras paulus@ozlabs.org powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race
Oleksandr Suvorov oleksandr.suvorov@toradex.com ASoC: sgtl5000: Improve VAG power and mute control
Oleksandr Suvorov oleksandr.suvorov@toradex.com ASoC: Define a set of DAPM pre/post-up events
Dmitry Osipenko digetx@gmail.com PM / devfreq: tegra: Fix kHz to Hz conversion
Mike Christie mchristi@redhat.com nbd: fix max number of supported devs
Wanpeng Li wanpengli@tencent.com KVM: X86: Fix userspace set invalid CR4
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Don't lose pending doorbell request on migration on P9
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Don't push XIVE context when not using XIVE device
Cédric Le Goater clg@kaod.org KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S: Enable XIVE native capability only if OPAL has required functions
Heiko Carstens heiko.carstens@de.ibm.com KVM: s390: fix __insn32_query() inline assembly
Stefan Haberland sth@linux.ibm.com Revert "s390/dasd: Add discard support for ESE volumes"
Jan Höppner hoeppner@linux.ibm.com s390/dasd: Fix error handling during online processing
Vasily Gorbik gor@linux.ibm.com s390/cio: exclude subchannels with no parent from pseudo check
Vasily Gorbik gor@linux.ibm.com s390/cio: avoid calling strlen on null pointer
Vasily Gorbik gor@linux.ibm.com s390/topology: avoid firing events before kobjs are created
Thomas Huth thuth@redhat.com KVM: s390: Test for bad access register and size at the start of S390_MEM_OP
Philipp Rudo prudo@linux.ibm.com s390/sclp: Fix bit checked for has_sipl
Vasily Gorbik gor@linux.ibm.com s390/process: avoid potential reading of freed stack
-------------
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/omap3-gta04.dtsi | 1 + arch/mips/include/asm/cpu-features.h | 16 ++ arch/mips/include/asm/cpu.h | 4 + arch/mips/kernel/cpu-probe.c | 6 + arch/mips/kernel/proc.c | 4 + arch/powerpc/include/asm/cputable.h | 5 +- arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/include/asm/xive.h | 9 + arch/powerpc/kernel/dt_cpu_ftrs.c | 32 ++- arch/powerpc/kernel/head_32.S | 6 +- arch/powerpc/kernel/mce.c | 11 +- arch/powerpc/kernel/mce_power.c | 19 +- arch/powerpc/kvm/book3s.c | 8 +- arch/powerpc/kvm/book3s_hv.c | 24 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 42 +++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 38 ++-- arch/powerpc/kvm/book3s_xive.c | 60 +++++- arch/powerpc/kvm/book3s_xive.h | 2 + arch/powerpc/kvm/book3s_xive_native.c | 23 ++- arch/powerpc/kvm/powerpc.c | 3 +- arch/powerpc/mm/book3s32/mmu.c | 9 + arch/powerpc/mm/book3s64/hash_native.c | 31 ++- arch/powerpc/mm/book3s64/hash_utils.c | 9 +- arch/powerpc/mm/book3s64/radix_tlb.c | 84 +++++++- arch/powerpc/mm/init_64.c | 17 +- arch/powerpc/mm/kasan/kasan_init_32.c | 57 +++++- arch/powerpc/mm/ptdump/ptdump.c | 2 +- arch/powerpc/platforms/powernv/opal.c | 11 +- arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +- arch/powerpc/platforms/pseries/lpar.c | 8 +- arch/powerpc/sysdev/xive/common.c | 87 +++++--- arch/powerpc/sysdev/xive/native.c | 7 + arch/riscv/kernel/entry.S | 6 +- arch/s390/kernel/process.c | 22 +- arch/s390/kernel/topology.c | 3 +- arch/s390/kvm/kvm-s390.c | 8 +- arch/x86/kvm/vmx/nested.c | 2 +- arch/x86/kvm/x86.c | 38 ++-- arch/x86/purgatory/Makefile | 1 + block/blk-mq-sched.c | 2 - block/blk.h | 2 + crypto/skcipher.c | 42 ++-- drivers/block/nbd.c | 39 ++-- drivers/crypto/caam/caamalg_desc.c | 9 + drivers/crypto/caam/caamalg_desc.h | 2 +- drivers/crypto/caam/error.c | 1 + drivers/crypto/caam/qi.c | 5 +- drivers/crypto/caam/regs.h | 1 + drivers/crypto/cavium/zip/zip_main.c | 3 + drivers/crypto/ccree/cc_aead.c | 2 +- drivers/crypto/ccree/cc_fips.c | 8 +- drivers/crypto/qat/qat_common/adf_common_drv.h | 2 +- drivers/devfreq/tegra-devfreq.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 + drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 2 +- drivers/gpu/drm/arm/malidp_hw.c | 3 +- drivers/gpu/drm/drm_atomic_uapi.c | 3 +- drivers/gpu/drm/drm_ioctl.c | 7 +- drivers/gpu/drm/i915/display/intel_display.c | 12 +- drivers/gpu/drm/i915/display/intel_display.h | 2 +- drivers/gpu/drm/i915/display/intel_dp.c | 184 +++++++++-------- drivers/gpu/drm/i915/display/intel_dp.h | 6 +- drivers/gpu/drm/i915/display/intel_dp_mst.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 10 +- drivers/gpu/drm/i915/gvt/scheduler.c | 28 +-- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem_gtt.c | 12 +- drivers/gpu/drm/i915/i915_vgpu.c | 1 + drivers/gpu/drm/msm/dsi/dsi_host.c | 8 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 3 +- drivers/gpu/drm/omapdrm/dss/dss.c | 2 +- drivers/gpu/drm/radeon/radeon_drv.c | 31 +++ drivers/gpu/drm/radeon/radeon_kms.c | 25 --- drivers/hwtracing/coresight/coresight-etm4x.c | 15 +- drivers/i2c/busses/i2c-qcom-geni.c | 12 +- drivers/iommu/amd_iommu.c | 3 +- drivers/mmc/host/sdhci-of-esdhc.c | 7 +- drivers/mmc/host/sdhci-tegra.c | 48 +++-- drivers/mmc/host/sdhci.c | 27 +-- drivers/mmc/host/sdhci.h | 1 + drivers/net/can/spi/mcp251x.c | 19 +- drivers/net/dsa/microchip/ksz_common.h | 2 +- .../net/ethernet/mellanox/mlxsw/spectrum_flower.c | 6 + drivers/net/ethernet/netronome/nfp/abm/cls.c | 14 +- drivers/net/ieee802154/atusb.c | 3 +- drivers/ntb/test/ntb_perf.c | 2 +- drivers/nvdimm/btt.c | 8 +- drivers/nvdimm/bus.c | 2 +- drivers/nvdimm/namespace_devs.c | 7 +- drivers/nvdimm/pfn_devs.c | 2 + drivers/nvdimm/region.c | 4 +- drivers/nvdimm/region_devs.c | 4 +- drivers/nvdimm/security.c | 4 + drivers/pci/controller/pci-hyperv.c | 2 +- drivers/pci/controller/vmd.c | 25 ++- drivers/pci/pci.c | 2 +- drivers/power/supply/sbs-battery.c | 27 ++- drivers/pwm/pwm-stm32-lp.c | 6 + drivers/s390/block/dasd_eckd.c | 81 +------- drivers/s390/char/sclp_early.c | 2 +- drivers/s390/cio/ccwgroup.c | 2 +- drivers/s390/cio/css.c | 2 + drivers/staging/erofs/dir.c | 11 +- drivers/staging/erofs/unzip_vle.c | 37 +++- drivers/staging/erofs/zmap.c | 6 + drivers/thermal/qcom/tsens-8960.c | 2 + drivers/thermal/qcom/tsens-v0_1.c | 12 +- drivers/thermal/qcom/tsens-v1.c | 1 + drivers/thermal/qcom/tsens.h | 1 + drivers/thermal/thermal_core.c | 2 +- drivers/thermal/thermal_hwmon.c | 8 +- drivers/watchdog/aspeed_wdt.c | 4 +- drivers/watchdog/imx2_wdt.c | 4 +- drivers/xen/balloon.c | 1 + drivers/xen/pci.c | 21 +- drivers/xen/xenbus/xenbus_dev_frontend.c | 20 +- fs/9p/vfs_file.c | 3 + fs/btrfs/tests/btrfs-tests.c | 8 +- fs/ceph/caps.c | 9 +- fs/ceph/inode.c | 7 +- fs/ceph/mds_client.c | 4 +- fs/fuse/cuse.c | 1 + fs/fuse/inode.c | 7 +- fs/nfs/nfs4xdr.c | 2 +- fs/nfs/pnfs.c | 9 +- fs/statfs.c | 17 +- include/linux/memremap.h | 1 + include/linux/sched/mm.h | 2 + include/sound/soc-dapm.h | 2 + include/trace/events/writeback.h | 38 ++-- include/uapi/linux/sched.h | 2 + kernel/elfcore.c | 1 + kernel/locking/qspinlock_paravirt.h | 2 +- kernel/sched/core.c | 4 +- kernel/sched/membarrier.c | 2 +- kernel/time/tick-broadcast-hrtimer.c | 57 +++--- kernel/time/timer.c | 8 +- kernel/trace/bpf_trace.c | 26 ++- kernel/trace/trace_events_hist.c | 2 + mm/usercopy.c | 8 +- net/9p/client.c | 1 + net/mac80211/util.c | 13 +- net/netfilter/nf_tables_api.c | 7 +- net/netfilter/nft_lookup.c | 3 - net/sunrpc/clnt.c | 20 +- net/sunrpc/sched.c | 5 +- net/sunrpc/xprtrdma/transport.c | 4 +- net/sunrpc/xprtrdma/verbs.c | 26 +-- net/wireless/nl80211.c | 41 +++- net/wireless/reg.c | 2 +- net/wireless/scan.c | 7 +- net/wireless/wext-compat.c | 2 +- security/integrity/ima/ima_crypto.c | 10 +- sound/soc/codecs/sgtl5000.c | 224 ++++++++++++++++++--- tools/lib/bpf/btf_dump.c | 1 + tools/lib/traceevent/Makefile | 4 +- tools/lib/traceevent/event-parse.c | 3 +- tools/perf/Makefile.config | 2 +- tools/perf/arch/x86/util/unwind-libunwind.c | 2 +- tools/perf/builtin-stat.c | 5 +- tools/perf/util/header.c | 2 +- tools/perf/util/probe-event.c | 1 + tools/perf/util/stat.c | 17 ++ tools/perf/util/stat.h | 1 + tools/testing/nvdimm/test/nfit_test.h | 4 +- tools/testing/selftests/bpf/progs/strobemeta.h | 5 +- tools/testing/selftests/pidfd/Makefile | 2 +- tools/testing/selftests/seccomp/seccomp_bpf.c | 5 + tools/testing/selftests/tpm2/Makefile | 1 + 171 files changed, 1624 insertions(+), 703 deletions(-)
From: Vasily Gorbik gor@linux.ibm.com
commit 8769f610fe6d473e5e8e221709c3ac402037da6c upstream.
With THREAD_INFO_IN_TASK (which is selected on s390) task's stack usage is refcounted and should always be protected by get/put when touching other task's stack to avoid race conditions with task's destruction code.
Fixes: d5c352cdd022 ("s390: move thread_info into task_struct") Cc: stable@vger.kernel.org # v4.10+ Acked-by: Ilya Leoshkevich iii@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/process.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
--- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -184,20 +184,30 @@ unsigned long get_wchan(struct task_stru
if (!p || p == current || p->state == TASK_RUNNING || !task_stack_page(p)) return 0; + + if (!try_get_task_stack(p)) + return 0; + low = task_stack_page(p); high = (struct stack_frame *) task_pt_regs(p); sf = (struct stack_frame *) p->thread.ksp; - if (sf <= low || sf > high) - return 0; + if (sf <= low || sf > high) { + return_address = 0; + goto out; + } for (count = 0; count < 16; count++) { sf = (struct stack_frame *) sf->back_chain; - if (sf <= low || sf > high) - return 0; + if (sf <= low || sf > high) { + return_address = 0; + goto out; + } return_address = sf->gprs[8]; if (!in_sched_functions(return_address)) - return return_address; + goto out; } - return 0; +out: + put_task_stack(p); + return return_address; }
unsigned long arch_align_stack(unsigned long sp)
From: Philipp Rudo prudo@linux.ibm.com
commit 4df9a82549cfed5b52da21e7d007b79b2ea1769a upstream.
Fixes: c9896acc7851 ("s390/ipl: Provide has_secure sysfs attribute") Cc: stable@vger.kernel.org # 5.2+ Reviewed-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Philipp Rudo prudo@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/char/sclp_early.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/char/sclp_early.c +++ b/drivers/s390/char/sclp_early.c @@ -40,7 +40,7 @@ static void __init sclp_early_facilities sclp.has_gisaf = !!(sccb->fac118 & 0x08); sclp.has_hvs = !!(sccb->fac119 & 0x80); sclp.has_kss = !!(sccb->fac98 & 0x01); - sclp.has_sipl = !!(sccb->cbl & 0x02); + sclp.has_sipl = !!(sccb->cbl & 0x4000); if (sccb->fac85 & 0x02) S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP; if (sccb->fac91 & 0x40)
From: Thomas Huth thuth@redhat.com
commit a13b03bbb4575b350b46090af4dfd30e735aaed1 upstream.
If the KVM_S390_MEM_OP ioctl is called with an access register >= 16, then there is certainly a bug in the calling userspace application. We check for wrong access registers, but only if the vCPU was already in the access register mode before (i.e. the SIE block has recorded it). The check is also buried somewhere deep in the calling chain (in the function ar_translation()), so this is somewhat hard to find.
It's better to always report an error to the userspace in case this field is set wrong, and it's safer in the KVM code if we block wrong values here early instead of relying on a check somewhere deep down the calling chain, so let's add another check to kvm_s390_guest_mem_op() directly.
We also should check that the "size" is non-zero here (thanks to Janosch Frank for the hint!). If we do not check the size, we could call vmalloc() with this 0 value, and this will cause a kernel warning.
Signed-off-by: Thomas Huth thuth@redhat.com Link: https://lkml.kernel.org/r/20190829122517.31042-1-thuth@redhat.com Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Reviewed-by: David Hildenbrand david@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kvm/kvm-s390.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4257,7 +4257,7 @@ static long kvm_s390_guest_mem_op(struct const u64 supported_flags = KVM_S390_MEMOP_F_INJECT_EXCEPTION | KVM_S390_MEMOP_F_CHECK_ONLY;
- if (mop->flags & ~supported_flags) + if (mop->flags & ~supported_flags || mop->ar >= NUM_ACRS || !mop->size) return -EINVAL;
if (mop->size > MEM_OP_MAX_SIZE)
From: Vasily Gorbik gor@linux.ibm.com
commit f3122a79a1b0a113d3aea748e0ec26f2cb2889de upstream.
arch_update_cpu_topology is first called from: kernel_init_freeable->sched_init_smp->sched_init_domains
even before cpus has been registered in: kernel_init_freeable->do_one_initcall->s390_smp_init
Do not trigger kobject_uevent change events until cpu devices are actually created. Fixes the following kasan findings:
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb40/0xee0 Read of size 8 at addr 0000000000000020 by task swapper/0/1
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb36/0xee0 Read of size 8 at addr 0000000000000018 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B Hardware name: IBM 3906 M04 704 (LPAR) Call Trace: ([<0000000143c6db7e>] show_stack+0x14e/0x1a8) [<0000000145956498>] dump_stack+0x1d0/0x218 [<000000014429fb4c>] print_address_description+0x64/0x380 [<000000014429f630>] __kasan_report+0x138/0x168 [<0000000145960b96>] kobject_uevent_env+0xb36/0xee0 [<0000000143c7c47c>] arch_update_cpu_topology+0x104/0x108 [<0000000143df9e22>] sched_init_domains+0x62/0xe8 [<000000014644c94a>] sched_init_smp+0x3a/0xc0 [<0000000146433a20>] kernel_init_freeable+0x558/0x958 [<000000014599002a>] kernel_init+0x22/0x160 [<00000001459a71d4>] ret_from_fork+0x28/0x30 [<00000001459a71dc>] kernel_thread_starter+0x0/0x10
Cc: stable@vger.kernel.org Reviewed-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/topology.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/s390/kernel/topology.c +++ b/arch/s390/kernel/topology.c @@ -311,7 +311,8 @@ int arch_update_cpu_topology(void) on_each_cpu(__arch_update_dedicated_flag, NULL, 0); for_each_online_cpu(cpu) { dev = get_cpu_device(cpu); - kobject_uevent(&dev->kobj, KOBJ_CHANGE); + if (dev) + kobject_uevent(&dev->kobj, KOBJ_CHANGE); } return rc; }
From: Vasily Gorbik gor@linux.ibm.com
commit ea298e6ee8b34b3ed4366be7eb799d0650ebe555 upstream.
Fix the following kasan finding: BUG: KASAN: global-out-of-bounds in ccwgroup_create_dev+0x850/0x1140 Read of size 1 at addr 0000000000000000 by task systemd-udevd.r/561
CPU: 30 PID: 561 Comm: systemd-udevd.r Tainted: G B Hardware name: IBM 3906 M04 704 (LPAR) Call Trace: ([<0000000231b3db7e>] show_stack+0x14e/0x1a8) [<0000000233826410>] dump_stack+0x1d0/0x218 [<000000023216fac4>] print_address_description+0x64/0x380 [<000000023216f5a8>] __kasan_report+0x138/0x168 [<00000002331b8378>] ccwgroup_create_dev+0x850/0x1140 [<00000002332b618a>] group_store+0x3a/0x50 [<00000002323ac706>] kernfs_fop_write+0x246/0x3b8 [<00000002321d409a>] vfs_write+0x132/0x450 [<00000002321d47da>] ksys_write+0x122/0x208 [<0000000233877102>] system_call+0x2a6/0x2c8
Triggered by: openat(AT_FDCWD, "/sys/bus/ccwgroup/drivers/qeth/group", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 16 write(16, "0.0.bd00,0.0.bd01,0.0.bd02", 26) = 26
The problem is that __get_next_id in ccwgroup_create_dev might set "buf" buffer pointer to NULL and explicit check for that is required.
Cc: stable@vger.kernel.org Reviewed-by: Sebastian Ott sebott@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/ccwgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/cio/ccwgroup.c +++ b/drivers/s390/cio/ccwgroup.c @@ -372,7 +372,7 @@ int ccwgroup_create_dev(struct device *p goto error; } /* Check for trailing stuff. */ - if (i == num_devices && strlen(buf) > 0) { + if (i == num_devices && buf && strlen(buf) > 0) { rc = -EINVAL; goto error; }
From: Vasily Gorbik gor@linux.ibm.com
commit ab5758848039de9a4b249d46e4ab591197eebaf2 upstream.
ccw console is created early in start_kernel and used before css is initialized or ccw console subchannel is registered. Until then console subchannel does not have a parent. For that reason assume subchannels with no parent are not pseudo subchannels. This fixes the following kasan finding:
BUG: KASAN: global-out-of-bounds in sch_is_pseudo_sch+0x8e/0x98 Read of size 8 at addr 00000000000005e8 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc8-07370-g6ac43dd12538 #2 Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) Call Trace: ([<000000000012cd76>] show_stack+0x14e/0x1e0) [<0000000001f7fb44>] dump_stack+0x1a4/0x1f8 [<00000000007d7afc>] print_address_description+0x64/0x3c8 [<00000000007d75f6>] __kasan_report+0x14e/0x180 [<00000000018a2986>] sch_is_pseudo_sch+0x8e/0x98 [<000000000189b950>] cio_enable_subchannel+0x1d0/0x510 [<00000000018cac7c>] ccw_device_recognition+0x12c/0x188 [<0000000002ceb1a8>] ccw_device_enable_console+0x138/0x340 [<0000000002cf1cbe>] con3215_init+0x25e/0x300 [<0000000002c8770a>] console_init+0x68a/0x9b8 [<0000000002c6a3d6>] start_kernel+0x4fe/0x728 [<0000000000100070>] startup_continue+0x70/0xd0
Cc: stable@vger.kernel.org Reviewed-by: Sebastian Ott sebott@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/css.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/s390/cio/css.c +++ b/drivers/s390/cio/css.c @@ -1388,6 +1388,8 @@ device_initcall(cio_settle_init);
int sch_is_pseudo_sch(struct subchannel *sch) { + if (!sch->dev.parent) + return 0; return sch == to_css(sch->dev.parent)->pseudo_subchannel; }
From: Jan Höppner hoeppner@linux.ibm.com
commit dd45483981ac62f432e073fea6e5e11200b9070d upstream.
It is possible that the CCW commands for reading volume and extent pool information are not supported, either by the storage server (for dedicated DASDs) or by z/VM (for virtual devices, such as MDISKs).
As a command reject will occur in such a case, the current error handling leads to a failing online processing and thus the DASD can't be used at all.
Since the data being read is not essential for an fully operational DASD, the error handling can be removed. Information about the failing command is sent to the s390dbf debug feature.
Fixes: c729696bcf8b ("s390/dasd: Recognise data for ESE volumes") Cc: stable@vger.kernel.org # 5.3 Reported-by: Frank Heimes frank.heimes@canonical.com Signed-off-by: Jan Höppner hoeppner@linux.ibm.com Signed-off-by: Stefan Haberland sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/block/dasd_eckd.c | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-)
--- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -1553,8 +1553,8 @@ static int dasd_eckd_read_vol_info(struc if (rc == 0) { memcpy(&private->vsq, vsq, sizeof(*vsq)); } else { - dev_warn(&device->cdev->dev, - "Reading the volume storage information failed with rc=%d\n", rc); + DBF_EVENT_DEVID(DBF_WARNING, device->cdev, + "Reading the volume storage information failed with rc=%d", rc); }
if (useglobal) @@ -1737,8 +1737,8 @@ static int dasd_eckd_read_ext_pool_info( if (rc == 0) { dasd_eckd_cpy_ext_pool_data(device, lcq); } else { - dev_warn(&device->cdev->dev, - "Reading the logical configuration failed with rc=%d\n", rc); + DBF_EVENT_DEVID(DBF_WARNING, device->cdev, + "Reading the logical configuration failed with rc=%d", rc); }
dasd_sfree_request(cqr, cqr->memdev); @@ -2020,14 +2020,10 @@ dasd_eckd_check_characteristics(struct d dasd_eckd_read_features(device);
/* Read Volume Information */ - rc = dasd_eckd_read_vol_info(device); - if (rc) - goto out_err3; + dasd_eckd_read_vol_info(device);
/* Read Extent Pool Information */ - rc = dasd_eckd_read_ext_pool_info(device); - if (rc) - goto out_err3; + dasd_eckd_read_ext_pool_info(device);
/* Read Device Characteristics */ rc = dasd_generic_read_dev_chars(device, DASD_ECKD_MAGIC, @@ -5663,14 +5659,10 @@ static int dasd_eckd_restore_device(stru dasd_eckd_read_features(device);
/* Read Volume Information */ - rc = dasd_eckd_read_vol_info(device); - if (rc) - goto out_err2; + dasd_eckd_read_vol_info(device);
/* Read Extent Pool Information */ - rc = dasd_eckd_read_ext_pool_info(device); - if (rc) - goto out_err2; + dasd_eckd_read_ext_pool_info(device);
/* Read Device Characteristics */ rc = dasd_generic_read_dev_chars(device, DASD_ECKD_MAGIC,
From: Stefan Haberland sth@linux.ibm.com
commit 964ce509e2ded52c1a61ad86044cc4d70abd9eb8 upstream.
This reverts commit 7e64db1597fe114b83fe17d0ba96c6aa5fca419a.
The thin provisioning feature introduces an IOCTL and the discard support to allow userspace tools and filesystems to release unused and previously allocated space respectively.
During some internal performance improvements and further tests, the release of allocated space revealed some issues that may lead to data corruption in some configurations when filesystems are mounted with discard support enabled.
While we're working on a fix and trying to clarify the situation, this commit reverts the discard support for ESE volumes to prevent potential data corruption.
Cc: stable@vger.kernel.org # 5.3 Signed-off-by: Stefan Haberland sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/block/dasd_eckd.c | 57 ++--------------------------------------- 1 file changed, 3 insertions(+), 54 deletions(-)
--- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -2055,9 +2055,6 @@ dasd_eckd_check_characteristics(struct d if (readonly) set_bit(DASD_FLAG_DEVICE_RO, &device->flags);
- if (dasd_eckd_is_ese(device)) - dasd_set_feature(device->cdev, DASD_FEATURE_DISCARD, 1); - dev_info(&device->cdev->dev, "New DASD %04X/%02X (CU %04X/%02X) " "with %d cylinders, %d heads, %d sectors%s\n", private->rdc_data.dev_type, @@ -3691,14 +3688,6 @@ static int dasd_eckd_release_space(struc return -EINVAL; }
-static struct dasd_ccw_req * -dasd_eckd_build_cp_discard(struct dasd_device *device, struct dasd_block *block, - struct request *req, sector_t first_trk, - sector_t last_trk) -{ - return dasd_eckd_dso_ras(device, block, req, first_trk, last_trk, 1); -} - static struct dasd_ccw_req *dasd_eckd_build_cp_cmd_single( struct dasd_device *startdev, struct dasd_block *block, @@ -4443,10 +4432,6 @@ static struct dasd_ccw_req *dasd_eckd_bu cmdwtd = private->features.feature[12] & 0x40; use_prefix = private->features.feature[8] & 0x01;
- if (req_op(req) == REQ_OP_DISCARD) - return dasd_eckd_build_cp_discard(startdev, block, req, - first_trk, last_trk); - cqr = NULL; if (cdlspecial || dasd_page_cache) { /* do nothing, just fall through to the cmd mode single case */ @@ -4725,14 +4710,12 @@ static struct dasd_ccw_req *dasd_eckd_bu struct dasd_block *block, struct request *req) { - struct dasd_device *startdev = NULL; struct dasd_eckd_private *private; - struct dasd_ccw_req *cqr; + struct dasd_device *startdev; unsigned long flags; + struct dasd_ccw_req *cqr;
- /* Discard requests can only be processed on base devices */ - if (req_op(req) != REQ_OP_DISCARD) - startdev = dasd_alias_get_start_dev(base); + startdev = dasd_alias_get_start_dev(base); if (!startdev) startdev = base; private = startdev->private; @@ -6513,20 +6496,8 @@ static void dasd_eckd_setup_blk_queue(st unsigned int logical_block_size = block->bp_block; struct request_queue *q = block->request_queue; struct dasd_device *device = block->base; - struct dasd_eckd_private *private; - unsigned int max_discard_sectors; - unsigned int max_bytes; - unsigned int ext_bytes; /* Extent Size in Bytes */ - int recs_per_trk; - int trks_per_cyl; - int ext_limit; - int ext_size; /* Extent Size in Cylinders */ int max;
- private = device->private; - trks_per_cyl = private->rdc_data.trk_per_cyl; - recs_per_trk = recs_per_track(&private->rdc_data, 0, logical_block_size); - if (device->features & DASD_FEATURE_USERAW) { /* * the max_blocks value for raw_track access is 256 @@ -6547,28 +6518,6 @@ static void dasd_eckd_setup_blk_queue(st /* With page sized segments each segment can be translated into one idaw/tidaw */ blk_queue_max_segment_size(q, PAGE_SIZE); blk_queue_segment_boundary(q, PAGE_SIZE - 1); - - if (dasd_eckd_is_ese(device)) { - /* - * Depending on the extent size, up to UINT_MAX bytes can be - * accepted. However, neither DASD_ECKD_RAS_EXTS_MAX nor the - * device limits should be exceeded. - */ - ext_size = dasd_eckd_ext_size(device); - ext_limit = min(private->real_cyl / ext_size, DASD_ECKD_RAS_EXTS_MAX); - ext_bytes = ext_size * trks_per_cyl * recs_per_trk * - logical_block_size; - max_bytes = UINT_MAX - (UINT_MAX % ext_bytes); - if (max_bytes / ext_bytes > ext_limit) - max_bytes = ext_bytes * ext_limit; - - max_discard_sectors = max_bytes / 512; - - blk_queue_max_discard_sectors(q, max_discard_sectors); - blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); - q->limits.discard_granularity = ext_bytes; - q->limits.discard_alignment = ext_bytes; - } }
static struct ccw_driver dasd_eckd_driver = {
From: Heiko Carstens heiko.carstens@de.ibm.com
commit b1c41ac3ce569b04644bb1e3fd28926604637da3 upstream.
The inline assembly constraints of __insn32_query() tell the compiler that only the first byte of "query" is being written to. Intended was probably that 32 bytes are written to.
Fix and simplify the code and just use a "memory" clobber.
Fixes: d668139718a9 ("KVM: s390: provide query function for instructions returning 32 byte") Cc: stable@vger.kernel.org # v5.2+ Acked-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kvm/kvm-s390.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -332,7 +332,7 @@ static inline int plo_test_bit(unsigned return cc == 0; }
-static inline void __insn32_query(unsigned int opcode, u8 query[32]) +static inline void __insn32_query(unsigned int opcode, u8 *query) { register unsigned long r0 asm("0") = 0; /* query function */ register unsigned long r1 asm("1") = (unsigned long) query; @@ -340,9 +340,9 @@ static inline void __insn32_query(unsign asm volatile( /* Parameter regs are ignored */ " .insn rrf,%[opc] << 16,2,4,6,0\n" - : "=m" (*query) + : : "d" (r0), "a" (r1), [opc] "i" (opcode) - : "cc"); + : "cc", "memory"); }
#define INSN_SORTL 0xb938
From: Paul Mackerras paulus@ozlabs.org
commit 2ad7a27deaf6d78545d97ab80874584f6990360e upstream.
There are some POWER9 machines where the OPAL firmware does not support the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls. The impact of this is that a guest using XIVE natively will not be able to be migrated successfully. On the source side, the get_attr operation on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute will fail; on the destination side, the set_attr operation for the same attribute will fail.
This adds tests for the existence of the OPAL get/set queue state functions, and if they are not supported, the XIVE-native KVM device is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false. Userspace can then either provide a software emulation of XIVE, or else tell the guest that it does not have a XIVE controller available to it.
Cc: stable@vger.kernel.org # v5.2+ Fixes: 3fab2d10588e ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode") Reviewed-by: David Gibson david@gibson.dropbear.id.au Reviewed-by: Cédric Le Goater clg@kaod.org Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/include/asm/xive.h | 1 + arch/powerpc/kvm/book3s.c | 8 +++++--- arch/powerpc/kvm/book3s_xive_native.c | 5 +++++ arch/powerpc/kvm/powerpc.c | 3 ++- arch/powerpc/sysdev/xive/native.c | 7 +++++++ 6 files changed, 21 insertions(+), 4 deletions(-)
--- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -598,6 +598,7 @@ extern int kvmppc_xive_native_get_vp(str union kvmppc_one_reg *val); extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val); +extern bool kvmppc_xive_native_supported(void);
#else static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server, --- a/arch/powerpc/include/asm/xive.h +++ b/arch/powerpc/include/asm/xive.h @@ -127,6 +127,7 @@ extern int xive_native_get_queue_state(u extern int xive_native_set_queue_state(u32 vp_id, uint32_t prio, u32 qtoggle, u32 qindex); extern int xive_native_get_vp_state(u32 vp_id, u64 *out_state); +extern bool xive_native_has_queue_state_support(void);
#else
--- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1083,9 +1083,11 @@ static int kvmppc_book3s_init(void) if (xics_on_xive()) { kvmppc_xive_init_module(); kvm_register_device_ops(&kvm_xive_ops, KVM_DEV_TYPE_XICS); - kvmppc_xive_native_init_module(); - kvm_register_device_ops(&kvm_xive_native_ops, - KVM_DEV_TYPE_XIVE); + if (kvmppc_xive_native_supported()) { + kvmppc_xive_native_init_module(); + kvm_register_device_ops(&kvm_xive_native_ops, + KVM_DEV_TYPE_XIVE); + } } else #endif kvm_register_device_ops(&kvm_xics_ops, KVM_DEV_TYPE_XICS); --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -1171,6 +1171,11 @@ int kvmppc_xive_native_set_vp(struct kvm return 0; }
+bool kvmppc_xive_native_supported(void) +{ + return xive_native_has_queue_state_support(); +} + static int xive_native_debug_show(struct seq_file *m, void *private) { struct kvmppc_xive *xive = m->private; --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -561,7 +561,8 @@ int kvm_vm_ioctl_check_extension(struct * a POWER9 processor) and the PowerNV platform, as * nested is not yet supported. */ - r = xive_enabled() && !!cpu_has_feature(CPU_FTR_HVMODE); + r = xive_enabled() && !!cpu_has_feature(CPU_FTR_HVMODE) && + kvmppc_xive_native_supported(); break; #endif
--- a/arch/powerpc/sysdev/xive/native.c +++ b/arch/powerpc/sysdev/xive/native.c @@ -811,6 +811,13 @@ int xive_native_set_queue_state(u32 vp_i } EXPORT_SYMBOL_GPL(xive_native_set_queue_state);
+bool xive_native_has_queue_state_support(void) +{ + return opal_check_token(OPAL_XIVE_GET_QUEUE_STATE) && + opal_check_token(OPAL_XIVE_SET_QUEUE_STATE); +} +EXPORT_SYMBOL_GPL(xive_native_has_queue_state_support); + int xive_native_get_vp_state(u32 vp_id, u64 *out_state) { __be64 state;
From: Cédric Le Goater clg@kaod.org
commit 237aed48c642328ff0ab19b63423634340224a06 upstream.
When a vCPU is brought done, the XIVE VP (Virtual Processor) is first disabled and then the event notification queues are freed. When freeing the queues, we check for possible escalation interrupts and free them also.
But when a XIVE VP is disabled, the underlying XIVE ENDs also are disabled in OPAL. When an END (Event Notification Descriptor) is disabled, its ESB pages (ESn and ESe) are disabled and loads return all 1s. Which means that any access on the ESB page of the escalation interrupt will return invalid values.
When an interrupt is freed, the shutdown handler computes a 'saved_p' field from the value returned by a load in xive_do_source_set_mask(). This value is incorrect for escalation interrupts for the reason described above.
This has no impact on Linux/KVM today because we don't make use of it but we will introduce in future changes a xive_get_irqchip_state() handler. This handler will use the 'saved_p' field to return the state of an interrupt and 'saved_p' being incorrect, softlockup will occur.
Fix the vCPU cleanup sequence by first freeing the escalation interrupts if any, then disable the XIVE VP and last free the queues.
Fixes: 90c73795afa2 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode") Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Cédric Le Goater clg@kaod.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_xive.c | 18 ++++++++++-------- arch/powerpc/kvm/book3s_xive_native.c | 12 +++++++----- 2 files changed, 17 insertions(+), 13 deletions(-)
--- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -1134,20 +1134,22 @@ void kvmppc_xive_cleanup_vcpu(struct kvm /* Mask the VP IPI */ xive_vm_esb_load(&xc->vp_ipi_data, XIVE_ESB_SET_PQ_01);
- /* Disable the VP */ - xive_native_disable_vp(xc->vp_id); - - /* Free the queues & associated interrupts */ + /* Free escalations */ for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { - struct xive_q *q = &xc->queues[i]; - - /* Free the escalation irq */ if (xc->esc_virq[i]) { free_irq(xc->esc_virq[i], vcpu); irq_dispose_mapping(xc->esc_virq[i]); kfree(xc->esc_virq_names[i]); } - /* Free the queue */ + } + + /* Disable the VP */ + xive_native_disable_vp(xc->vp_id); + + /* Free the queues */ + for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { + struct xive_q *q = &xc->queues[i]; + xive_native_disable_queue(xc->vp_id, q, i); if (q->qpage) { free_pages((unsigned long)q->qpage, --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -67,10 +67,7 @@ void kvmppc_xive_native_cleanup_vcpu(str xc->valid = false; kvmppc_xive_disable_vcpu_interrupts(vcpu);
- /* Disable the VP */ - xive_native_disable_vp(xc->vp_id); - - /* Free the queues & associated interrupts */ + /* Free escalations */ for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { /* Free the escalation irq */ if (xc->esc_virq[i]) { @@ -79,8 +76,13 @@ void kvmppc_xive_native_cleanup_vcpu(str kfree(xc->esc_virq_names[i]); xc->esc_virq[i] = 0; } + }
- /* Free the queue */ + /* Disable the VP */ + xive_native_disable_vp(xc->vp_id); + + /* Free the queues */ + for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { kvmppc_xive_native_cleanup_queue(vcpu, i); }
From: Paul Mackerras paulus@ozlabs.org
commit 8d4ba9c931bc384bcc6889a43915aaaf19d3e499 upstream.
At present, when running a guest on POWER9 using HV KVM but not using an in-kernel interrupt controller (XICS or XIVE), for example if QEMU is run with the kernel_irqchip=off option, the guest entry code goes ahead and tries to load the guest context into the XIVE hardware, even though no context has been set up.
To fix this, we check that the "CAM word" is non-zero before pushing it to the hardware. The CAM word is initialized to a non-zero value in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(), and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Reported-by: Cédric Le Goater clg@kaod.org Signed-off-by: Paul Mackerras paulus@ozlabs.org Reviewed-by: Cédric Le Goater clg@kaod.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190813100100.GC9567@blackberry Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++ arch/powerpc/kvm/book3s_xive.c | 11 ++++++++++- arch/powerpc/kvm/book3s_xive_native.c | 3 +++ 3 files changed, 15 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -942,6 +942,8 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_3 ld r11, VCPU_XIVE_SAVED_STATE(r4) li r9, TM_QW1_OS lwz r8, VCPU_XIVE_CAM_WORD(r4) + cmpwi r8, 0 + beq no_xive li r7, TM_QW1_OS + TM_WORD2 mfmsr r0 andi. r0, r0, MSR_DR /* in real mode? */ --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -67,8 +67,14 @@ void kvmppc_xive_push_vcpu(struct kvm_vc void __iomem *tima = local_paca->kvm_hstate.xive_tima_virt; u64 pq;
- if (!tima) + /* + * Nothing to do if the platform doesn't have a XIVE + * or this vCPU doesn't have its own XIVE context + * (e.g. because it's not using an in-kernel interrupt controller). + */ + if (!tima || !vcpu->arch.xive_cam_word) return; + eieio(); __raw_writeq(vcpu->arch.xive_saved_state.w01, tima + TM_QW1_OS); __raw_writel(vcpu->arch.xive_cam_word, tima + TM_QW1_OS + TM_WORD2); @@ -1146,6 +1152,9 @@ void kvmppc_xive_cleanup_vcpu(struct kvm /* Disable the VP */ xive_native_disable_vp(xc->vp_id);
+ /* Clear the cam word so guest entry won't try to push context */ + vcpu->arch.xive_cam_word = 0; + /* Free the queues */ for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { struct xive_q *q = &xc->queues[i]; --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -81,6 +81,9 @@ void kvmppc_xive_native_cleanup_vcpu(str /* Disable the VP */ xive_native_disable_vp(xc->vp_id);
+ /* Clear the cam word so guest entry won't try to push context */ + vcpu->arch.xive_cam_word = 0; + /* Free the queues */ for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { kvmppc_xive_native_cleanup_queue(vcpu, i);
From: Paul Mackerras paulus@ozlabs.org
commit 959c5d5134786b4988b6fdd08e444aa67d1667ed upstream.
Escalation interrupts are interrupts sent to the host by the XIVE hardware when it has an interrupt to deliver to a guest VCPU but that VCPU is not running anywhere in the system. Hence we disable the escalation interrupt for the VCPU being run when we enter the guest and re-enable it when the guest does an H_CEDE hypercall indicating it is idle.
It is possible that an escalation interrupt gets generated just as we are entering the guest. In that case the escalation interrupt may be using a queue entry in one of the interrupt queues, and that queue entry may not have been processed when the guest exits with an H_CEDE. The existing entry code detects this situation and does not clear the vcpu->arch.xive_esc_on flag as an indication that there is a pending queue entry (if the queue entry gets processed, xive_esc_irq() will clear the flag). There is a comment in the code saying that if the flag is still set on H_CEDE, we have to abort the cede rather than re-enabling the escalation interrupt, lest we end up with two occurrences of the escalation interrupt in the interrupt queue.
However, the exit code doesn't do that; it aborts the cede in the sense that vcpu->arch.ceded gets cleared, but it still enables the escalation interrupt by setting the source's PQ bits to 00. Instead we need to set the PQ bits to 10, indicating that an interrupt has been triggered. We also need to avoid setting vcpu->arch.xive_esc_on in this case (i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because xive_esc_irq() will run at some point and clear it, and if we race with that we may end up with an incorrect result (i.e. xive_esc_on set when the escalation interrupt has just been handled).
It is extremely unlikely that having two queue entries would cause observable problems; theoretically it could cause queue overflow, but the CPU would have to have thousands of interrupts targetted to it for that to be possible. However, this fix will also make it possible to determine accurately whether there is an unhandled escalation interrupt in the queue, which will be needed by the following patch.
Fixes: 9b9b13a6d153 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberry Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 36 ++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 13 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2833,29 +2833,39 @@ kvm_cede_prodded: kvm_cede_exit: ld r9, HSTATE_KVM_VCPU(r13) #ifdef CONFIG_KVM_XICS - /* Abort if we still have a pending escalation */ + /* are we using XIVE with single escalation? */ + ld r10, VCPU_XIVE_ESC_VADDR(r9) + cmpdi r10, 0 + beq 3f + li r6, XIVE_ESB_SET_PQ_00 + /* + * If we still have a pending escalation, abort the cede, + * and we must set PQ to 10 rather than 00 so that we don't + * potentially end up with two entries for the escalation + * interrupt in the XIVE interrupt queue. In that case + * we also don't want to set xive_esc_on to 1 here in + * case we race with xive_esc_irq(). + */ lbz r5, VCPU_XIVE_ESC_ON(r9) cmpwi r5, 0 - beq 1f + beq 4f li r0, 0 stb r0, VCPU_CEDED(r9) -1: /* Enable XIVE escalation */ - li r5, XIVE_ESB_SET_PQ_00 + li r6, XIVE_ESB_SET_PQ_10 + b 5f +4: li r0, 1 + stb r0, VCPU_XIVE_ESC_ON(r9) + /* make sure store to xive_esc_on is seen before xive_esc_irq runs */ + sync +5: /* Enable XIVE escalation */ mfmsr r0 andi. r0, r0, MSR_DR /* in real mode? */ beq 1f - ld r10, VCPU_XIVE_ESC_VADDR(r9) - cmpdi r10, 0 - beq 3f - ldx r0, r10, r5 + ldx r0, r10, r6 b 2f 1: ld r10, VCPU_XIVE_ESC_RADDR(r9) - cmpdi r10, 0 - beq 3f - ldcix r0, r10, r5 + ldcix r0, r10, r6 2: sync - li r0, 1 - stb r0, VCPU_XIVE_ESC_ON(r9) #endif /* CONFIG_KVM_XICS */ 3: b guest_exit_cont
From: Paul Mackerras paulus@ozlabs.org
commit d28eafc5a64045c78136162af9d4ba42f8230080 upstream.
When we are running multiple vcores on the same physical core, they could be from different VMs and so it is possible that one of the VMs could have its arch.mmu_ready flag cleared (for example by a concurrent HPT resize) when we go to run it on a physical core. We currently check the arch.mmu_ready flag for the primary vcore but not the flags for the other vcores that will be run alongside it. This adds that check, and also a check when we select the secondary vcores from the preempted vcores list.
Cc: stable@vger.kernel.org # v4.14+ Fixes: 38c53af85306 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates") Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2860,7 +2860,7 @@ static void collect_piggybacks(struct co if (!spin_trylock(&pvc->lock)) continue; prepare_threads(pvc); - if (!pvc->n_runnable) { + if (!pvc->n_runnable || !pvc->kvm->arch.mmu_ready) { list_del_init(&pvc->preempt_list); if (pvc->runner == NULL) { pvc->vcore_state = VCORE_INACTIVE; @@ -2881,15 +2881,20 @@ static void collect_piggybacks(struct co spin_unlock(&lp->lock); }
-static bool recheck_signals(struct core_info *cip) +static bool recheck_signals_and_mmu(struct core_info *cip) { int sub, i; struct kvm_vcpu *vcpu; + struct kvmppc_vcore *vc;
- for (sub = 0; sub < cip->n_subcores; ++sub) - for_each_runnable_thread(i, vcpu, cip->vc[sub]) + for (sub = 0; sub < cip->n_subcores; ++sub) { + vc = cip->vc[sub]; + if (!vc->kvm->arch.mmu_ready) + return true; + for_each_runnable_thread(i, vcpu, vc) if (signal_pending(vcpu->arch.run_task)) return true; + } return false; }
@@ -3119,7 +3124,7 @@ static noinline void kvmppc_run_core(str local_irq_disable(); hard_irq_disable(); if (lazy_irq_pending() || need_resched() || - recheck_signals(&core_info) || !vc->kvm->arch.mmu_ready) { + recheck_signals_and_mmu(&core_info)) { local_irq_enable(); vc->vcore_state = VCORE_INACTIVE; /* Unlock all except the primary vcore */
From: Paul Mackerras paulus@ozlabs.org
commit ff42df49e75f053a8a6b4c2533100cdcc23afe69 upstream.
On POWER9, when userspace reads the value of the DPDES register on a vCPU, it is possible for 0 to be returned although there is a doorbell interrupt pending for the vCPU. This can lead to a doorbell interrupt being lost across migration. If the guest kernel uses doorbell interrupts for IPIs, then it could malfunction because of the lost interrupt.
This happens because a newly-generated doorbell interrupt is signalled by setting vcpu->arch.doorbell_request to 1; the DPDES value in vcpu->arch.vcore->dpdes is not updated, because it can only be updated when holding the vcpu mutex, in order to avoid races.
To fix this, we OR in vcpu->arch.doorbell_request when reading the DPDES value.
Cc: stable@vger.kernel.org # v4.13+ Fixes: 579006944e0d ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9") Signed-off-by: Paul Mackerras paulus@ozlabs.org Tested-by: Alexey Kardashevskiy aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1678,7 +1678,14 @@ static int kvmppc_get_one_reg_hv(struct *val = get_reg_val(id, vcpu->arch.pspb); break; case KVM_REG_PPC_DPDES: - *val = get_reg_val(id, vcpu->arch.vcore->dpdes); + /* + * On POWER9, where we are emulating msgsndp etc., + * we return 1 bit for each vcpu, which can come from + * either vcore->dpdes or doorbell_request. + * On POWER8, doorbell_request is 0. + */ + *val = get_reg_val(id, vcpu->arch.vcore->dpdes | + vcpu->arch.doorbell_request); break; case KVM_REG_PPC_VTB: *val = get_reg_val(id, vcpu->arch.vcore->vtb);
From: Wanpeng Li wanpengli@tencent.com
commit 3ca94192278ca8de169d78c085396c424be123b3 upstream.
Reported by syzkaller:
WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel] CPU: 0 PID: 6544 Comm: a.out Tainted: G OE 5.3.0-rc4+ #4 RIP: 0010:handle_desc+0x37/0x40 [kvm_intel] Call Trace: vmx_handle_exit+0xbe/0x6b0 [kvm_intel] vcpu_enter_guest+0x4dc/0x18d0 [kvm] kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm] kvm_vcpu_ioctl+0x3ad/0x690 [kvm] do_vfs_ioctl+0xa2/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x74/0x720 entry_SYSCALL_64_after_hwframe+0x49/0xbe
When CR4.UMIP is set, guest should have UMIP cpuid flag. Current kvm set_sregs function doesn't have such check when userspace inputs sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast triggers handle_desc warning when executing ltr instruction since guest architectural CR4 doesn't set UMIP. This patch fixes it by adding valid CR4 and CPUID combination checking in __set_sregs.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000
Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li wanpengli@tencent.com Reviewed-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -884,34 +884,42 @@ int kvm_set_xcr(struct kvm_vcpu *vcpu, u } EXPORT_SYMBOL_GPL(kvm_set_xcr);
-int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { - unsigned long old_cr4 = kvm_read_cr4(vcpu); - unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE | - X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE; - if (cr4 & CR4_RESERVED_BITS) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && (cr4 & X86_CR4_OSXSAVE)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_SMEP) && (cr4 & X86_CR4_SMEP)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_SMAP) && (cr4 & X86_CR4_SMAP)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_FSGSBASE) && (cr4 & X86_CR4_FSGSBASE)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_PKU) && (cr4 & X86_CR4_PKE)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_LA57) && (cr4 & X86_CR4_LA57)) - return 1; + return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_UMIP) && (cr4 & X86_CR4_UMIP)) + return -EINVAL; + + return 0; +} + +int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + unsigned long old_cr4 = kvm_read_cr4(vcpu); + unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE | + X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE; + + if (kvm_valid_cr4(vcpu, cr4)) return 1;
if (is_long_mode(vcpu)) { @@ -8598,10 +8606,6 @@ EXPORT_SYMBOL_GPL(kvm_task_switch);
static int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) { - if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && - (sregs->cr4 & X86_CR4_OSXSAVE)) - return -EINVAL; - if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) { /* * When EFER.LME and CR0.PG are set, the processor is in @@ -8620,7 +8624,7 @@ static int kvm_valid_sregs(struct kvm_vc return -EINVAL; }
- return 0; + return kvm_valid_cr4(vcpu, sregs->cr4); }
static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
From: Mike Christie mchristi@redhat.com
commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 upstream.
This fixes a bug added in 4.10 with commit:
commit 9561a7ade0c205bc2ee035a2ac880478dcc1a024 Author: Josef Bacik jbacik@fb.com Date: Tue Nov 22 14:04:40 2016 -0500
nbd: add multi-connection support
that limited the number of devices to 256. Before the patch we could create 1000s of devices, but the patch switched us from using our own thread to using a work queue which has a default limit of 256 active works.
The problem is that our recv_work function sits in a loop until disconnection but only handles IO for one connection. The work is started when the connection is started/restarted, but if we end up creating 257 or more connections, the queue_work call just queues connection257+'s recv_work and that waits for connection 1 - 256's recv_work to be disconnected and that work instance completing.
Instead of reverting back to kthreads, this has us allocate a workqueue_struct per device, so we can block in the work.
Cc: stable@vger.kernel.org Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Mike Christie mchristi@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/nbd.c | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-)
--- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -108,6 +108,7 @@ struct nbd_device { struct nbd_config *config; struct mutex config_lock; struct gendisk *disk; + struct workqueue_struct *recv_workq;
struct list_head list; struct task_struct *task_recv; @@ -138,7 +139,6 @@ static struct dentry *nbd_dbg_dir;
static unsigned int nbds_max = 16; static int max_part = 16; -static struct workqueue_struct *recv_workqueue; static int part_shift;
static int nbd_dev_dbg_init(struct nbd_device *nbd); @@ -1038,7 +1038,7 @@ static int nbd_reconnect_socket(struct n /* We take the tx_mutex in an error path in the recv_work, so we * need to queue_work outside of the tx_mutex. */ - queue_work(recv_workqueue, &args->work); + queue_work(nbd->recv_workq, &args->work);
atomic_inc(&config->live_connections); wake_up(&config->conn_wait); @@ -1139,6 +1139,10 @@ static void nbd_config_put(struct nbd_de kfree(nbd->config); nbd->config = NULL;
+ if (nbd->recv_workq) + destroy_workqueue(nbd->recv_workq); + nbd->recv_workq = NULL; + nbd->tag_set.timeout = 0; nbd->disk->queue->limits.discard_granularity = 0; nbd->disk->queue->limits.discard_alignment = 0; @@ -1167,6 +1171,14 @@ static int nbd_start_device(struct nbd_d return -EINVAL; }
+ nbd->recv_workq = alloc_workqueue("knbd%d-recv", + WQ_MEM_RECLAIM | WQ_HIGHPRI | + WQ_UNBOUND, 0, nbd->index); + if (!nbd->recv_workq) { + dev_err(disk_to_dev(nbd->disk), "Could not allocate knbd recv work queue.\n"); + return -ENOMEM; + } + blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections); nbd->task_recv = current;
@@ -1197,7 +1209,7 @@ static int nbd_start_device(struct nbd_d INIT_WORK(&args->work, recv_work); args->nbd = nbd; args->index = i; - queue_work(recv_workqueue, &args->work); + queue_work(nbd->recv_workq, &args->work); } nbd_size_update(nbd); return error; @@ -1217,8 +1229,10 @@ static int nbd_start_device_ioctl(struct mutex_unlock(&nbd->config_lock); ret = wait_event_interruptible(config->recv_wq, atomic_read(&config->recv_threads) == 0); - if (ret) + if (ret) { sock_shutdown(nbd); + flush_workqueue(nbd->recv_workq); + } mutex_lock(&nbd->config_lock); nbd_bdev_reset(bdev); /* user requested, ignore socket errors */ @@ -1877,6 +1891,12 @@ static void nbd_disconnect_and_put(struc nbd_disconnect(nbd); nbd_clear_sock(nbd); mutex_unlock(&nbd->config_lock); + /* + * Make sure recv thread has finished, so it does not drop the last + * config ref and try to destroy the workqueue from inside the work + * queue. + */ + flush_workqueue(nbd->recv_workq); if (test_and_clear_bit(NBD_HAS_CONFIG_REF, &nbd->config->runtime_flags)) nbd_config_put(nbd); @@ -2263,20 +2283,12 @@ static int __init nbd_init(void)
if (nbds_max > 1UL << (MINORBITS - part_shift)) return -EINVAL; - recv_workqueue = alloc_workqueue("knbd-recv", - WQ_MEM_RECLAIM | WQ_HIGHPRI | - WQ_UNBOUND, 0); - if (!recv_workqueue) - return -ENOMEM;
- if (register_blkdev(NBD_MAJOR, "nbd")) { - destroy_workqueue(recv_workqueue); + if (register_blkdev(NBD_MAJOR, "nbd")) return -EIO; - }
if (genl_register_family(&nbd_genl_family)) { unregister_blkdev(NBD_MAJOR, "nbd"); - destroy_workqueue(recv_workqueue); return -EINVAL; } nbd_dbg_init(); @@ -2318,7 +2330,6 @@ static void __exit nbd_cleanup(void)
idr_destroy(&nbd_index_idr); genl_unregister_family(&nbd_genl_family); - destroy_workqueue(recv_workqueue); unregister_blkdev(NBD_MAJOR, "nbd"); }
From: Dmitry Osipenko digetx@gmail.com
commit 62bacb06b9f08965c4ef10e17875450490c948c0 upstream.
The kHz to Hz is incorrectly converted in a few places in the code, this results in a wrong frequency being calculated because devfreq core uses OPP frequencies that are given in Hz to clamp the rate, while tegra-devfreq gives to the core value in kHz and then it also expects to receive value in kHz from the core. In a result memory freq is always set to a value which is close to ULONG_MAX because of the bug. Hence the EMC frequency is always capped to the maximum and the driver doesn't do anything useful. This patch was tested on Tegra30 and Tegra124 SoC's, EMC frequency scaling works properly now.
Cc: stable@vger.kernel.org # 4.14+ Tested-by: Steev Klimaszewski steev@kali.org Reviewed-by: Chanwoo Choi cw00.choi@samsung.com Signed-off-by: Dmitry Osipenko digetx@gmail.com Acked-by: Thierry Reding treding@nvidia.com Signed-off-by: MyungJoo Ham myungjoo.ham@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/devfreq/tegra-devfreq.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
--- a/drivers/devfreq/tegra-devfreq.c +++ b/drivers/devfreq/tegra-devfreq.c @@ -474,11 +474,11 @@ static int tegra_devfreq_target(struct d { struct tegra_devfreq *tegra = dev_get_drvdata(dev); struct dev_pm_opp *opp; - unsigned long rate = *freq * KHZ; + unsigned long rate;
- opp = devfreq_recommended_opp(dev, &rate, flags); + opp = devfreq_recommended_opp(dev, freq, flags); if (IS_ERR(opp)) { - dev_err(dev, "Failed to find opp for %lu KHz\n", *freq); + dev_err(dev, "Failed to find opp for %lu Hz\n", *freq); return PTR_ERR(opp); } rate = dev_pm_opp_get_freq(opp); @@ -487,8 +487,6 @@ static int tegra_devfreq_target(struct d clk_set_min_rate(tegra->emc_clock, rate); clk_set_rate(tegra->emc_clock, 0);
- *freq = rate; - return 0; }
@@ -498,7 +496,7 @@ static int tegra_devfreq_get_dev_status( struct tegra_devfreq *tegra = dev_get_drvdata(dev); struct tegra_devfreq_device *actmon_dev;
- stat->current_frequency = tegra->cur_freq; + stat->current_frequency = tegra->cur_freq * KHZ;
/* To be used by the tegra governor */ stat->private_data = tegra; @@ -553,7 +551,7 @@ static int tegra_governor_get_target(str target_freq = max(target_freq, dev->target_freq); }
- *freq = target_freq; + *freq = target_freq * KHZ;
return 0; }
From: Oleksandr Suvorov oleksandr.suvorov@toradex.com
commit cfc8f568aada98f9608a0a62511ca18d647613e2 upstream.
Prepare to use SND_SOC_DAPM_PRE_POST_PMU definition to reduce coming code size and make it more readable.
Cc: stable@vger.kernel.org Signed-off-by: Oleksandr Suvorov oleksandr.suvorov@toradex.com Reviewed-by: Marcel Ziswiler marcel.ziswiler@toradex.com Reviewed-by: Igor Opaniuk igor.opaniuk@toradex.com Reviewed-by: Fabio Estevam festevam@gmail.com Link: https://lore.kernel.org/r/20190719100524.23300-2-oleksandr.suvorov@toradex.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/sound/soc-dapm.h | 2 ++ 1 file changed, 2 insertions(+)
--- a/include/sound/soc-dapm.h +++ b/include/sound/soc-dapm.h @@ -353,6 +353,8 @@ struct device; #define SND_SOC_DAPM_WILL_PMD 0x80 /* called at start of sequence */ #define SND_SOC_DAPM_PRE_POST_PMD \ (SND_SOC_DAPM_PRE_PMD | SND_SOC_DAPM_POST_PMD) +#define SND_SOC_DAPM_PRE_POST_PMU \ + (SND_SOC_DAPM_PRE_PMU | SND_SOC_DAPM_POST_PMU)
/* convenience event type detection */ #define SND_SOC_DAPM_EVENT_ON(e) \
From: Oleksandr Suvorov oleksandr.suvorov@toradex.com
commit b1f373a11d25fc9a5f7679c9b85799fe09b0dc4a upstream.
VAG power control is improved to fit the manual [1]. This patch fixes as minimum one bug: if customer muxes Headphone to Line-In right after boot, the VAG power remains off that leads to poor sound quality from line-in.
I.e. after boot: - Connect sound source to Line-In jack; - Connect headphone to HP jack; - Run following commands: $ amixer set 'Headphone' 80% $ amixer set 'Headphone Mux' LINE_IN
Change VAG power on/off control according to the following algorithm: - turn VAG power ON on the 1st incoming event. - keep it ON if there is any active VAG consumer (ADC/DAC/HP/Line-In). - turn VAG power OFF when there is the latest consumer's pre-down event come. - always delay after VAG power OFF to avoid pop. - delay after VAG power ON if the initiative consumer is Line-In, this prevents pop during line-in muxing.
According to the data sheet [1], to avoid any pops/clicks, the outputs should be muted during input/output routing changes.
[1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf
Cc: stable@vger.kernel.org Fixes: 9b34e6cc3bc2 ("ASoC: Add Freescale SGTL5000 codec support") Signed-off-by: Oleksandr Suvorov oleksandr.suvorov@toradex.com Reviewed-by: Marcel Ziswiler marcel.ziswiler@toradex.com Reviewed-by: Fabio Estevam festevam@gmail.com Reviewed-by: Cezary Rojewski cezary.rojewski@intel.com Link: https://lore.kernel.org/r/20190719100524.23300-3-oleksandr.suvorov@toradex.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/codecs/sgtl5000.c | 224 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 194 insertions(+), 30 deletions(-)
--- a/sound/soc/codecs/sgtl5000.c +++ b/sound/soc/codecs/sgtl5000.c @@ -31,6 +31,13 @@ #define SGTL5000_DAP_REG_OFFSET 0x0100 #define SGTL5000_MAX_REG_OFFSET 0x013A
+/* Delay for the VAG ramp up */ +#define SGTL5000_VAG_POWERUP_DELAY 500 /* ms */ +/* Delay for the VAG ramp down */ +#define SGTL5000_VAG_POWERDOWN_DELAY 500 /* ms */ + +#define SGTL5000_OUTPUTS_MUTE (SGTL5000_HP_MUTE | SGTL5000_LINE_OUT_MUTE) + /* default value of sgtl5000 registers */ static const struct reg_default sgtl5000_reg_defaults[] = { { SGTL5000_CHIP_DIG_POWER, 0x0000 }, @@ -123,6 +130,13 @@ enum { I2S_SCLK_STRENGTH_HIGH, };
+enum { + HP_POWER_EVENT, + DAC_POWER_EVENT, + ADC_POWER_EVENT, + LAST_POWER_EVENT = ADC_POWER_EVENT +}; + /* sgtl5000 private structure in codec */ struct sgtl5000_priv { int sysclk; /* sysclk rate */ @@ -137,8 +151,109 @@ struct sgtl5000_priv { u8 micbias_voltage; u8 lrclk_strength; u8 sclk_strength; + u16 mute_state[LAST_POWER_EVENT + 1]; };
+static inline int hp_sel_input(struct snd_soc_component *component) +{ + return (snd_soc_component_read32(component, SGTL5000_CHIP_ANA_CTRL) & + SGTL5000_HP_SEL_MASK) >> SGTL5000_HP_SEL_SHIFT; +} + +static inline u16 mute_output(struct snd_soc_component *component, + u16 mute_mask) +{ + u16 mute_reg = snd_soc_component_read32(component, + SGTL5000_CHIP_ANA_CTRL); + + snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_CTRL, + mute_mask, mute_mask); + return mute_reg; +} + +static inline void restore_output(struct snd_soc_component *component, + u16 mute_mask, u16 mute_reg) +{ + snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_CTRL, + mute_mask, mute_reg); +} + +static void vag_power_on(struct snd_soc_component *component, u32 source) +{ + if (snd_soc_component_read32(component, SGTL5000_CHIP_ANA_POWER) & + SGTL5000_VAG_POWERUP) + return; + + snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER, + SGTL5000_VAG_POWERUP, SGTL5000_VAG_POWERUP); + + /* When VAG powering on to get local loop from Line-In, the sleep + * is required to avoid loud pop. + */ + if (hp_sel_input(component) == SGTL5000_HP_SEL_LINE_IN && + source == HP_POWER_EVENT) + msleep(SGTL5000_VAG_POWERUP_DELAY); +} + +static int vag_power_consumers(struct snd_soc_component *component, + u16 ana_pwr_reg, u32 source) +{ + int consumers = 0; + + /* count dac/adc consumers unconditional */ + if (ana_pwr_reg & SGTL5000_DAC_POWERUP) + consumers++; + if (ana_pwr_reg & SGTL5000_ADC_POWERUP) + consumers++; + + /* + * If the event comes from HP and Line-In is selected, + * current action is 'DAC to be powered down'. + * As HP_POWERUP is not set when HP muxed to line-in, + * we need to keep VAG power ON. + */ + if (source == HP_POWER_EVENT) { + if (hp_sel_input(component) == SGTL5000_HP_SEL_LINE_IN) + consumers++; + } else { + if (ana_pwr_reg & SGTL5000_HP_POWERUP) + consumers++; + } + + return consumers; +} + +static void vag_power_off(struct snd_soc_component *component, u32 source) +{ + u16 ana_pwr = snd_soc_component_read32(component, + SGTL5000_CHIP_ANA_POWER); + + if (!(ana_pwr & SGTL5000_VAG_POWERUP)) + return; + + /* + * This function calls when any of VAG power consumers is disappearing. + * Thus, if there is more than one consumer at the moment, as minimum + * one consumer will definitely stay after the end of the current + * event. + * Don't clear VAG_POWERUP if 2 or more consumers of VAG present: + * - LINE_IN (for HP events) / HP (for DAC/ADC events) + * - DAC + * - ADC + * (the current consumer is disappearing right now) + */ + if (vag_power_consumers(component, ana_pwr, source) >= 2) + return; + + snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER, + SGTL5000_VAG_POWERUP, 0); + /* In power down case, we need wait 400-1000 ms + * when VAG fully ramped down. + * As longer we wait, as smaller pop we've got. + */ + msleep(SGTL5000_VAG_POWERDOWN_DELAY); +} + /* * mic_bias power on/off share the same register bits with * output impedance of mic bias, when power on mic bias, we @@ -170,36 +285,46 @@ static int mic_bias_event(struct snd_soc return 0; }
-/* - * As manual described, ADC/DAC only works when VAG powerup, - * So enabled VAG before ADC/DAC up. - * In power down case, we need wait 400ms when vag fully ramped down. - */ -static int power_vag_event(struct snd_soc_dapm_widget *w, - struct snd_kcontrol *kcontrol, int event) +static int vag_and_mute_control(struct snd_soc_component *component, + int event, int event_source) { - struct snd_soc_component *component = snd_soc_dapm_to_component(w->dapm); - const u32 mask = SGTL5000_DAC_POWERUP | SGTL5000_ADC_POWERUP; + static const u16 mute_mask[] = { + /* + * Mask for HP_POWER_EVENT. + * Muxing Headphones have to be wrapped with mute/unmute + * headphones only. + */ + SGTL5000_HP_MUTE, + /* + * Masks for DAC_POWER_EVENT/ADC_POWER_EVENT. + * Muxing DAC or ADC block have to wrapped with mute/unmute + * both headphones and line-out. + */ + SGTL5000_OUTPUTS_MUTE, + SGTL5000_OUTPUTS_MUTE + }; + + struct sgtl5000_priv *sgtl5000 = + snd_soc_component_get_drvdata(component);
switch (event) { + case SND_SOC_DAPM_PRE_PMU: + sgtl5000->mute_state[event_source] = + mute_output(component, mute_mask[event_source]); + break; case SND_SOC_DAPM_POST_PMU: - snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER, - SGTL5000_VAG_POWERUP, SGTL5000_VAG_POWERUP); - msleep(400); + vag_power_on(component, event_source); + restore_output(component, mute_mask[event_source], + sgtl5000->mute_state[event_source]); break; - case SND_SOC_DAPM_PRE_PMD: - /* - * Don't clear VAG_POWERUP, when both DAC and ADC are - * operational to prevent inadvertently starving the - * other one of them. - */ - if ((snd_soc_component_read32(component, SGTL5000_CHIP_ANA_POWER) & - mask) != mask) { - snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER, - SGTL5000_VAG_POWERUP, 0); - msleep(400); - } + sgtl5000->mute_state[event_source] = + mute_output(component, mute_mask[event_source]); + vag_power_off(component, event_source); + break; + case SND_SOC_DAPM_POST_PMD: + restore_output(component, mute_mask[event_source], + sgtl5000->mute_state[event_source]); break; default: break; @@ -208,6 +333,41 @@ static int power_vag_event(struct snd_so return 0; }
+/* + * Mute Headphone when power it up/down. + * Control VAG power on HP power path. + */ +static int headphone_pga_event(struct snd_soc_dapm_widget *w, + struct snd_kcontrol *kcontrol, int event) +{ + struct snd_soc_component *component = + snd_soc_dapm_to_component(w->dapm); + + return vag_and_mute_control(component, event, HP_POWER_EVENT); +} + +/* As manual describes, ADC/DAC powering up/down requires + * to mute outputs to avoid pops. + * Control VAG power on ADC/DAC power path. + */ +static int adc_updown_depop(struct snd_soc_dapm_widget *w, + struct snd_kcontrol *kcontrol, int event) +{ + struct snd_soc_component *component = + snd_soc_dapm_to_component(w->dapm); + + return vag_and_mute_control(component, event, ADC_POWER_EVENT); +} + +static int dac_updown_depop(struct snd_soc_dapm_widget *w, + struct snd_kcontrol *kcontrol, int event) +{ + struct snd_soc_component *component = + snd_soc_dapm_to_component(w->dapm); + + return vag_and_mute_control(component, event, DAC_POWER_EVENT); +} + /* input sources for ADC */ static const char *adc_mux_text[] = { "MIC_IN", "LINE_IN" @@ -280,7 +440,10 @@ static const struct snd_soc_dapm_widget mic_bias_event, SND_SOC_DAPM_POST_PMU | SND_SOC_DAPM_PRE_PMD),
- SND_SOC_DAPM_PGA("HP", SGTL5000_CHIP_ANA_POWER, 4, 0, NULL, 0), + SND_SOC_DAPM_PGA_E("HP", SGTL5000_CHIP_ANA_POWER, 4, 0, NULL, 0, + headphone_pga_event, + SND_SOC_DAPM_PRE_POST_PMU | + SND_SOC_DAPM_PRE_POST_PMD), SND_SOC_DAPM_PGA("LO", SGTL5000_CHIP_ANA_POWER, 0, 0, NULL, 0),
SND_SOC_DAPM_MUX("Capture Mux", SND_SOC_NOPM, 0, 0, &adc_mux), @@ -301,11 +464,12 @@ static const struct snd_soc_dapm_widget 0, SGTL5000_CHIP_DIG_POWER, 1, 0),
- SND_SOC_DAPM_ADC("ADC", "Capture", SGTL5000_CHIP_ANA_POWER, 1, 0), - SND_SOC_DAPM_DAC("DAC", "Playback", SGTL5000_CHIP_ANA_POWER, 3, 0), - - SND_SOC_DAPM_PRE("VAG_POWER_PRE", power_vag_event), - SND_SOC_DAPM_POST("VAG_POWER_POST", power_vag_event), + SND_SOC_DAPM_ADC_E("ADC", "Capture", SGTL5000_CHIP_ANA_POWER, 1, 0, + adc_updown_depop, SND_SOC_DAPM_PRE_POST_PMU | + SND_SOC_DAPM_PRE_POST_PMD), + SND_SOC_DAPM_DAC_E("DAC", "Playback", SGTL5000_CHIP_ANA_POWER, 3, 0, + dac_updown_depop, SND_SOC_DAPM_PRE_POST_PMU | + SND_SOC_DAPM_PRE_POST_PMD), };
/* routes for sgtl5000 */
From: Paul Mackerras paulus@ozlabs.org
commit da15c03b047dca891d37b9f4ef9ca14d84a6484f upstream.
Testing has revealed the existence of a race condition where a XIVE interrupt being shut down can be in one of the XIVE interrupt queues (of which there are up to 8 per CPU, one for each priority) at the point where free_irq() is called. If this happens, can return an interrupt number which has been shut down. This can lead to various symptoms:
- irq_to_desc(irq) can be NULL. In this case, no end-of-interrupt function gets called, resulting in the CPU's elevated interrupt priority (numerically lowered CPPR) never gets reset. That then means that the CPU stops processing interrupts, causing device timeouts and other errors in various device drivers.
- The irq descriptor or related data structures can be in the process of being freed as the interrupt code is using them. This typically leads to crashes due to bad pointer dereferences.
This race is basically what commit 62e0468650c3 ("genirq: Add optional hardware synchronization for shutdown", 2019-06-28) is intended to fix, given a get_irqchip_state() method for the interrupt controller being used. It works by polling the interrupt controller when an interrupt is being freed until the controller says it is not pending.
With XIVE, the PQ bits of the interrupt source indicate the state of the interrupt source, and in particular the P bit goes from 0 to 1 at the point where the hardware writes an entry into the interrupt queue that this interrupt is directed towards. Normally, the code will then process the interrupt and do an end-of-interrupt (EOI) operation which will reset PQ to 00 (assuming another interrupt hasn't been generated in the meantime). However, there are situations where the code resets P even though a queue entry exists (for example, by setting PQ to 01, which disables the interrupt source), and also situations where the code leaves P at 1 after removing the queue entry (for example, this is done for escalation interrupts so they cannot fire again until they are explicitly re-enabled).
The code already has a 'saved_p' flag for the interrupt source which indicates that a queue entry exists, although it isn't maintained consistently. This patch adds a 'stale_p' flag to indicate that P has been left at 1 after processing a queue entry, and adds code to set and clear saved_p and stale_p as necessary to maintain a consistent indication of whether a queue entry may or may not exist.
With this, we can implement xive_get_irqchip_state() by looking at stale_p, saved_p and the ESB PQ bits for the interrupt.
There is some additional code to handle escalation interrupts properly; because they are enabled and disabled in KVM assembly code, which does not have access to the xive_irq_data struct for the escalation interrupt. Hence, stale_p may be incorrect when the escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu(). Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on, with some careful attention to barriers in order to ensure the correct result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().
Finally, this adds code to make noise on the console (pr_crit and WARN_ON(1)) if we find an interrupt queue entry for an interrupt which does not have a descriptor. While this won't catch the race reliably, if it does get triggered it will be an indication that the race is occurring and needs to be debugged.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/xive.h | 8 +++ arch/powerpc/kvm/book3s_xive.c | 31 ++++++++++++ arch/powerpc/kvm/book3s_xive.h | 2 arch/powerpc/kvm/book3s_xive_native.c | 3 + arch/powerpc/sysdev/xive/common.c | 87 +++++++++++++++++++++++++--------- 5 files changed, 108 insertions(+), 23 deletions(-)
--- a/arch/powerpc/include/asm/xive.h +++ b/arch/powerpc/include/asm/xive.h @@ -46,7 +46,15 @@ struct xive_irq_data {
/* Setup/used by frontend */ int target; + /* + * saved_p means that there is a queue entry for this interrupt + * in some CPU's queue (not including guest vcpu queues), even + * if P is not set in the source ESB. + * stale_p means that there is no queue entry for this interrupt + * in some CPU's queue, even if P is set in the source ESB. + */ bool saved_p; + bool stale_p; }; #define XIVE_IRQ_FLAG_STORE_EOI 0x01 #define XIVE_IRQ_FLAG_LSI 0x02 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -166,6 +166,9 @@ static irqreturn_t xive_esc_irq(int irq, */ vcpu->arch.xive_esc_on = false;
+ /* This orders xive_esc_on = false vs. subsequent stale_p = true */ + smp_wmb(); /* goes with smp_mb() in cleanup_single_escalation */ + return IRQ_HANDLED; }
@@ -1119,6 +1122,31 @@ void kvmppc_xive_disable_vcpu_interrupts vcpu->arch.xive_esc_raddr = 0; }
+/* + * In single escalation mode, the escalation interrupt is marked so + * that EOI doesn't re-enable it, but just sets the stale_p flag to + * indicate that the P bit has already been dealt with. However, the + * assembly code that enters the guest sets PQ to 00 without clearing + * stale_p (because it has no easy way to address it). Hence we have + * to adjust stale_p before shutting down the interrupt. + */ +void xive_cleanup_single_escalation(struct kvm_vcpu *vcpu, + struct kvmppc_xive_vcpu *xc, int irq) +{ + struct irq_data *d = irq_get_irq_data(irq); + struct xive_irq_data *xd = irq_data_get_irq_handler_data(d); + + /* + * This slightly odd sequence gives the right result + * (i.e. stale_p set if xive_esc_on is false) even if + * we race with xive_esc_irq() and xive_irq_eoi(). + */ + xd->stale_p = false; + smp_mb(); /* paired with smb_wmb in xive_esc_irq */ + if (!vcpu->arch.xive_esc_on) + xd->stale_p = true; +} + void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) { struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; @@ -1143,6 +1171,9 @@ void kvmppc_xive_cleanup_vcpu(struct kvm /* Free escalations */ for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { if (xc->esc_virq[i]) { + if (xc->xive->single_escalation) + xive_cleanup_single_escalation(vcpu, xc, + xc->esc_virq[i]); free_irq(xc->esc_virq[i], vcpu); irq_dispose_mapping(xc->esc_virq[i]); kfree(xc->esc_virq_names[i]); --- a/arch/powerpc/kvm/book3s_xive.h +++ b/arch/powerpc/kvm/book3s_xive.h @@ -282,6 +282,8 @@ int kvmppc_xive_select_target(struct kvm int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio, bool single_escalation); struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type); +void xive_cleanup_single_escalation(struct kvm_vcpu *vcpu, + struct kvmppc_xive_vcpu *xc, int irq);
#endif /* CONFIG_KVM_XICS */ #endif /* _KVM_PPC_BOOK3S_XICS_H */ --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -71,6 +71,9 @@ void kvmppc_xive_native_cleanup_vcpu(str for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { /* Free the escalation irq */ if (xc->esc_virq[i]) { + if (xc->xive->single_escalation) + xive_cleanup_single_escalation(vcpu, xc, + xc->esc_virq[i]); free_irq(xc->esc_virq[i], vcpu); irq_dispose_mapping(xc->esc_virq[i]); kfree(xc->esc_virq_names[i]); --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -135,7 +135,7 @@ static u32 xive_read_eq(struct xive_q *q static u32 xive_scan_interrupts(struct xive_cpu *xc, bool just_peek) { u32 irq = 0; - u8 prio; + u8 prio = 0;
/* Find highest pending priority */ while (xc->pending_prio != 0) { @@ -148,8 +148,19 @@ static u32 xive_scan_interrupts(struct x irq = xive_read_eq(&xc->queue[prio], just_peek);
/* Found something ? That's it */ - if (irq) - break; + if (irq) { + if (just_peek || irq_to_desc(irq)) + break; + /* + * We should never get here; if we do then we must + * have failed to synchronize the interrupt properly + * when shutting it down. + */ + pr_crit("xive: got interrupt %d without descriptor, dropping\n", + irq); + WARN_ON(1); + continue; + }
/* Clear pending bits */ xc->pending_prio &= ~(1 << prio); @@ -307,6 +318,7 @@ static void xive_do_queue_eoi(struct xiv */ static void xive_do_source_eoi(u32 hw_irq, struct xive_irq_data *xd) { + xd->stale_p = false; /* If the XIVE supports the new "store EOI facility, use it */ if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI) xive_esb_write(xd, XIVE_ESB_STORE_EOI, 0); @@ -350,7 +362,7 @@ static void xive_do_source_eoi(u32 hw_ir } }
-/* irq_chip eoi callback */ +/* irq_chip eoi callback, called with irq descriptor lock held */ static void xive_irq_eoi(struct irq_data *d) { struct xive_irq_data *xd = irq_data_get_irq_handler_data(d); @@ -366,6 +378,8 @@ static void xive_irq_eoi(struct irq_data if (!irqd_irq_disabled(d) && !irqd_is_forwarded_to_vcpu(d) && !(xd->flags & XIVE_IRQ_NO_EOI)) xive_do_source_eoi(irqd_to_hwirq(d), xd); + else + xd->stale_p = true;
/* * Clear saved_p to indicate that it's no longer occupying @@ -397,11 +411,16 @@ static void xive_do_source_set_mask(stru */ if (mask) { val = xive_esb_read(xd, XIVE_ESB_SET_PQ_01); - xd->saved_p = !!(val & XIVE_ESB_VAL_P); - } else if (xd->saved_p) + if (!xd->stale_p && !!(val & XIVE_ESB_VAL_P)) + xd->saved_p = true; + xd->stale_p = false; + } else if (xd->saved_p) { xive_esb_read(xd, XIVE_ESB_SET_PQ_10); - else + xd->saved_p = false; + } else { xive_esb_read(xd, XIVE_ESB_SET_PQ_00); + xd->stale_p = false; + } }
/* @@ -541,6 +560,8 @@ static unsigned int xive_irq_startup(str unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d); int target, rc;
+ xd->saved_p = false; + xd->stale_p = false; pr_devel("xive_irq_startup: irq %d [0x%x] data @%p\n", d->irq, hw_irq, d);
@@ -587,6 +608,7 @@ static unsigned int xive_irq_startup(str return 0; }
+/* called with irq descriptor lock held */ static void xive_irq_shutdown(struct irq_data *d) { struct xive_irq_data *xd = irq_data_get_irq_handler_data(d); @@ -602,16 +624,6 @@ static void xive_irq_shutdown(struct irq xive_do_source_set_mask(xd, true);
/* - * The above may have set saved_p. We clear it otherwise it - * will prevent re-enabling later on. It is ok to forget the - * fact that the interrupt might be in a queue because we are - * accounting that already in xive_dec_target_count() and will - * be re-routing it to a new queue with proper accounting when - * it's started up again - */ - xd->saved_p = false; - - /* * Mask the interrupt in HW in the IVT/EAS and set the number * to be the "bad" IRQ number */ @@ -797,6 +809,10 @@ static int xive_irq_retrigger(struct irq return 1; }
+/* + * Caller holds the irq descriptor lock, so this won't be called + * concurrently with xive_get_irqchip_state on the same interrupt. + */ static int xive_irq_set_vcpu_affinity(struct irq_data *d, void *state) { struct xive_irq_data *xd = irq_data_get_irq_handler_data(d); @@ -820,6 +836,10 @@ static int xive_irq_set_vcpu_affinity(st
/* Set it to PQ=10 state to prevent further sends */ pq = xive_esb_read(xd, XIVE_ESB_SET_PQ_10); + if (!xd->stale_p) { + xd->saved_p = !!(pq & XIVE_ESB_VAL_P); + xd->stale_p = !xd->saved_p; + }
/* No target ? nothing to do */ if (xd->target == XIVE_INVALID_TARGET) { @@ -827,7 +847,7 @@ static int xive_irq_set_vcpu_affinity(st * An untargetted interrupt should have been * also masked at the source */ - WARN_ON(pq & 2); + WARN_ON(xd->saved_p);
return 0; } @@ -847,9 +867,8 @@ static int xive_irq_set_vcpu_affinity(st * This saved_p is cleared by the host EOI, when we know * for sure the queue slot is no longer in use. */ - if (pq & 2) { - pq = xive_esb_read(xd, XIVE_ESB_SET_PQ_11); - xd->saved_p = true; + if (xd->saved_p) { + xive_esb_read(xd, XIVE_ESB_SET_PQ_11);
/* * Sync the XIVE source HW to ensure the interrupt @@ -862,8 +881,7 @@ static int xive_irq_set_vcpu_affinity(st */ if (xive_ops->sync_source) xive_ops->sync_source(hw_irq); - } else - xd->saved_p = false; + } } else { irqd_clr_forwarded_to_vcpu(d);
@@ -914,6 +932,23 @@ static int xive_irq_set_vcpu_affinity(st return 0; }
+/* Called with irq descriptor lock held. */ +static int xive_get_irqchip_state(struct irq_data *data, + enum irqchip_irq_state which, bool *state) +{ + struct xive_irq_data *xd = irq_data_get_irq_handler_data(data); + + switch (which) { + case IRQCHIP_STATE_ACTIVE: + *state = !xd->stale_p && + (xd->saved_p || + !!(xive_esb_read(xd, XIVE_ESB_GET) & XIVE_ESB_VAL_P)); + return 0; + default: + return -EINVAL; + } +} + static struct irq_chip xive_irq_chip = { .name = "XIVE-IRQ", .irq_startup = xive_irq_startup, @@ -925,6 +960,7 @@ static struct irq_chip xive_irq_chip = { .irq_set_type = xive_irq_set_type, .irq_retrigger = xive_irq_retrigger, .irq_set_vcpu_affinity = xive_irq_set_vcpu_affinity, + .irq_get_irqchip_state = xive_get_irqchip_state, };
bool is_xive_irq(struct irq_chip *chip) @@ -1338,6 +1374,11 @@ static void xive_flush_cpu_queue(unsigne xd = irq_desc_get_handler_data(desc);
/* + * Clear saved_p to indicate that it's no longer pending + */ + xd->saved_p = false; + + /* * For LSIs, we EOI, this will cause a resend if it's * still asserted. Otherwise do an MSI retrigger. */
From: Balbir Singh bsingharora@gmail.com
commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee upstream.
The current code would fail on huge pages addresses, since the shift would be incorrect. Use the correct page shift value returned by __find_linux_pte() to get the correct physical address. The code is more generic and can handle both regular and compound pages.
Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh bsingharora@gmail.com [arbab@linux.ibm.com: Fixup pseries_do_memory_failure()] Signed-off-by: Reza Arbab arbab@linux.ibm.com Tested-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com Signed-off-by: Santosh Sivaraj santosh@fossix.org Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@fossix.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/mce_power.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
--- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -26,6 +26,7 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) { pte_t *ptep; + unsigned int shift; unsigned long flags; struct mm_struct *mm;
@@ -35,13 +36,18 @@ unsigned long addr_to_pfn(struct pt_regs mm = &init_mm;
local_irq_save(flags); - if (mm == current->mm) - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); - else - ptep = find_init_mm_pte(addr, NULL); + ptep = __find_linux_pte(mm->pgd, addr, NULL, &shift); local_irq_restore(flags); + if (!ptep || pte_special(*ptep)) return ULONG_MAX; + + if (shift > PAGE_SHIFT) { + unsigned long rpnmask = (1ul << shift) - PAGE_SIZE; + + return pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask))); + } + return pte_pfn(*ptep); }
@@ -344,7 +350,7 @@ static const struct mce_derror_table mce MCE_INITIATOR_CPU, MCE_SEV_SEVERE, true }, { 0, false, 0, 0, 0, 0, 0 } };
-static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, +static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr, uint64_t *phys_addr) { /* @@ -541,7 +547,8 @@ static int mce_handle_derror(struct pt_r * kernel/exception-64s.h */ if (get_paca()->in_mce < MAX_MCE_DEPTH) - mce_find_instr_ea_and_pfn(regs, addr, phys_addr); + mce_find_instr_ea_and_phys(regs, addr, + phys_addr); } found = 1; }
From: Santosh Sivaraj santosh@fossix.org
commit b5bda6263cad9a927e1a4edb7493d542da0c1410 upstream.
schedule_work() cannot be called from MCE exception context as MCE can interrupt even in interrupt disabled context.
Fixes: 733e4a4c4467 ("powerpc/mce: hookup memory_failure for UE errors") Cc: stable@vger.kernel.org # v4.15+ Reviewed-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Acked-by: Balbir Singh bsingharora@gmail.com Signed-off-by: Santosh Sivaraj santosh@fossix.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190820081352.8641-2-santosh@fossix.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/mce.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -33,6 +33,7 @@ static DEFINE_PER_CPU(struct machine_che mce_ue_event_queue);
static void machine_check_process_queued_event(struct irq_work *work); +static void machine_check_ue_irq_work(struct irq_work *work); void machine_check_ue_event(struct machine_check_event *evt); static void machine_process_ue_event(struct work_struct *work);
@@ -40,6 +41,10 @@ static struct irq_work mce_event_process .func = machine_check_process_queued_event, };
+static struct irq_work mce_ue_event_irq_work = { + .func = machine_check_ue_irq_work, +}; + DECLARE_WORK(mce_ue_event_work, machine_process_ue_event);
static void mce_set_error_info(struct machine_check_event *mce, @@ -199,6 +204,10 @@ void release_mce_event(void) get_mce_event(NULL, true); }
+static void machine_check_ue_irq_work(struct irq_work *work) +{ + schedule_work(&mce_ue_event_work); +}
/* * Queue up the MCE event which then can be handled later. @@ -216,7 +225,7 @@ void machine_check_ue_event(struct machi memcpy(this_cpu_ptr(&mce_ue_event_queue[index]), evt, sizeof(*evt));
/* Queue work to process this event later. */ - schedule_work(&mce_ue_event_work); + irq_work_queue(&mce_ue_event_irq_work); }
/*
From: Christophe Leroy christophe.leroy@c-s.fr
commit 415480dce2ef03bb8335deebd2f402f475443ce0 upstream.
If a page is already mapped RW without the DIRTY flag, the DIRTY flag is never set and a TLB store miss exception is taken forever.
This is easily reproduced with the following app:
void main(void) { volatile char *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
*ptr = *ptr; }
When DIRTY flag is not set, bail out of TLB miss handler and take a minor page fault which will set the DIRTY flag.
Fixes: f8b58c64eaef ("powerpc/603: let's handle PAGE_DIRTY directly") Cc: stable@vger.kernel.org # v5.1+ Reported-by: Doug Crawford doug.crawford@intelight-its.com Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/80432f71194d7ee75b2f5043ecf1501cf1cca1f3.156619664... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/head_32.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -557,9 +557,9 @@ DataStoreTLBMiss: cmplw 0,r1,r3 mfspr r2, SPRN_SPRG_PGDIR #ifdef CONFIG_SWAP - li r1, _PAGE_RW | _PAGE_PRESENT | _PAGE_ACCESSED + li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED #else - li r1, _PAGE_RW | _PAGE_PRESENT + li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT #endif bge- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
From: Christophe Leroy christophe.leroy@c-s.fr
commit 9d6d712fbf7766f21c838940eebcd7b4d476c5e6 upstream.
When KASAN is selected, the definitive hash table has to be set up later, but there is already an early temporary one.
When KASAN is not selected, there is no early hash table, so the setup of the definitive hash table cannot be delayed.
Fixes: 72f208c6a8f7 ("powerpc/32s: move hash code patching out of MMU_init_hw()") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Jonathan Neuschafer j.neuschaefer@gmx.net Tested-by: Jonathan Neuschafer j.neuschaefer@gmx.net Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/b7860c5e1e784d6b96ba67edf47dd6cbc2e78ab6.156577689... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/head_32.S | 2 ++ arch/powerpc/mm/book3s32/mmu.c | 9 +++++++++ 2 files changed, 11 insertions(+)
--- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -897,9 +897,11 @@ start_here: bl machine_init bl __save_cpu_setup bl MMU_init +#ifdef CONFIG_KASAN BEGIN_MMU_FTR_SECTION bl MMU_init_hw_patch END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE) +#endif
/* * Go back to running unmapped so we can load up new values --- a/arch/powerpc/mm/book3s32/mmu.c +++ b/arch/powerpc/mm/book3s32/mmu.c @@ -358,6 +358,15 @@ void __init MMU_init_hw(void) hash_mb2 = hash_mb = 32 - LG_HPTEG_SIZE - lg_n_hpteg; if (lg_n_hpteg > 16) hash_mb2 = 16 - LG_HPTEG_SIZE; + + /* + * When KASAN is selected, there is already an early temporary hash + * table and the switch to the final hash table is done later. + */ + if (IS_ENABLED(CONFIG_KASAN)) + return; + + MMU_init_hw_patch(); }
void __init MMU_init_hw_patch(void)
From: Christophe Leroy christophe.leroy@c-s.fr
commit 7c7a532ba3fc51bf9527d191fb410786c1fdc73c upstream.
Commit 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") wrongly changed KERN_VIRT_START from 0 to PAGE_OFFSET, leading to a shift in the displayed addresses.
Lets revert that change to resync walk_pagetables()'s addr val and pgd_t pointer for PPC32.
Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/eb4d626514e22f85814830012642329018ef6af9.156578609... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/mm/ptdump/ptdump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/powerpc/mm/ptdump/ptdump.c +++ b/arch/powerpc/mm/ptdump/ptdump.c @@ -27,7 +27,7 @@ #include "ptdump.h"
#ifdef CONFIG_PPC32 -#define KERN_VIRT_START PAGE_OFFSET +#define KERN_VIRT_START 0 #endif
/*
From: Andrew Donnellan ajd@linux.ibm.com
commit e7de4f7b64c23e503a8c42af98d56f2a7462bd6d upstream.
Currently the OPAL symbol map is globally readable, which seems bad as it contains physical addresses.
Restrict it to root.
Fixes: c8742f85125d ("powerpc/powernv: Expose OPAL firmware symbol map") Cc: stable@vger.kernel.org # v3.19+ Suggested-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Andrew Donnellan ajd@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190503075253.22798-1-ajd@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/platforms/powernv/opal.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -705,7 +705,10 @@ static ssize_t symbol_map_read(struct fi bin_attr->size); }
-static BIN_ATTR_RO(symbol_map, 0); +static struct bin_attribute symbol_map_attr = { + .attr = {.name = "symbol_map", .mode = 0400}, + .read = symbol_map_read +};
static void opal_export_symmap(void) { @@ -722,10 +725,10 @@ static void opal_export_symmap(void) return;
/* Setup attributes */ - bin_attr_symbol_map.private = __va(be64_to_cpu(syms[0])); - bin_attr_symbol_map.size = be64_to_cpu(syms[1]); + symbol_map_attr.private = __va(be64_to_cpu(syms[0])); + symbol_map_attr.size = be64_to_cpu(syms[1]);
- rc = sysfs_create_bin_file(opal_kobj, &bin_attr_symbol_map); + rc = sysfs_create_bin_file(opal_kobj, &symbol_map_attr); if (rc) pr_warn("Error %d creating OPAL symbols file\n", rc); }
From: Gautham R. Shenoy ego@linux.vnet.ibm.com
commit c784be435d5dae28d3b03db31753dd7a18733f0c upstream.
The calls to arch_add_memory()/arch_remove_memory() are always made with the read-side cpu_hotplug_lock acquired via memory_hotplug_begin(). On pSeries, arch_add_memory()/arch_remove_memory() eventually call resize_hpt() which in turn calls stop_machine() which acquires the read-side cpu_hotplug_lock again, thereby resulting in the recursive acquisition of this lock.
In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system lockup during a memory hotplug operation because cpus_read_lock() is a per-cpu rwsem read, which, in the fast-path (in the absence of the writer, which in our case is a CPU-hotplug operation) simply increments the read_count on the semaphore. Thus a recursive read in the fast-path doesn't cause any problems.
However, we can hit this problem in practice if there is a concurrent CPU-Hotplug operation in progress which is waiting to acquire the write-side of the lock. This will cause the second recursive read to block until the writer finishes. While the writer is blocked since the first read holds the lock. Thus both the reader as well as the writers fail to make any progress thereby blocking both CPU-Hotplug as well as Memory Hotplug operations.
Memory-Hotplug CPU-Hotplug CPU 0 CPU 1 ------ ------
1. down_read(cpu_hotplug_lock.rw_sem) [memory_hotplug_begin] 2. down_write(cpu_hotplug_lock.rw_sem) [cpu_up/cpu_down] 3. down_read(cpu_hotplug_lock.rw_sem) [stop_machine()]
Lockdep complains as follows in these code-paths.
swapper/0/1 is trying to acquire lock: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60
but task is already holding lock: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(cpu_hotplug_lock.rw_sem); lock(cpu_hotplug_lock.rw_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by swapper/0/1: #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0 #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: mem_hotplug_begin+0x20/0x50 #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x54/0x1a0
stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty #166 Call Trace: dump_stack+0xe8/0x164 (unreliable) __lock_acquire+0x1110/0x1c70 lock_acquire+0x240/0x290 cpus_read_lock+0x64/0xf0 stop_machine+0x2c/0x60 pseries_lpar_resize_hpt+0x19c/0x2c0 resize_hpt_for_hotplug+0x70/0xd0 arch_add_memory+0x58/0xfc devm_memremap_pages+0x5e8/0x8f0 pmem_attach_disk+0x764/0x830 nvdimm_bus_probe+0x118/0x240 really_probe+0x230/0x4b0 driver_probe_device+0x16c/0x1e0 __driver_attach+0x148/0x1b0 bus_for_each_dev+0x90/0x130 driver_attach+0x34/0x50 bus_add_driver+0x1a8/0x360 driver_register+0x108/0x170 __nd_driver_register+0xd0/0xf0 nd_pmem_driver_init+0x34/0x48 do_one_initcall+0x1e0/0x45c kernel_init_freeable+0x540/0x64c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0x68
Fix this issue by 1) Requiring all the calls to pseries_lpar_resize_hpt() be made with cpu_hotplug_lock held.
2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked() as a consequence of 1)
3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt() with cpu_hotplug_lock held.
Fixes: dbcf929c0062 ("powerpc/pseries: Add support for hash table resizing") Cc: stable@vger.kernel.org # v4.11+ Reported-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Gautham R. Shenoy ego@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/1557906352-29048-1-git-send-email-ego@linux.vnet.i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/mm/book3s64/hash_utils.c | 9 ++++++++- arch/powerpc/platforms/pseries/lpar.c | 8 ++++++-- 2 files changed, 14 insertions(+), 3 deletions(-)
--- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -34,6 +34,7 @@ #include <linux/libfdt.h> #include <linux/pkeys.h> #include <linux/hugetlb.h> +#include <linux/cpu.h>
#include <asm/debugfs.h> #include <asm/processor.h> @@ -1931,10 +1932,16 @@ static int hpt_order_get(void *data, u64
static int hpt_order_set(void *data, u64 val) { + int ret; + if (!mmu_hash_ops.resize_hpt) return -ENODEV;
- return mmu_hash_ops.resize_hpt(val); + cpus_read_lock(); + ret = mmu_hash_ops.resize_hpt(val); + cpus_read_unlock(); + + return ret; }
DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, "%llu\n"); --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -1413,7 +1413,10 @@ static int pseries_lpar_resize_hpt_commi return 0; }
-/* Must be called in user context */ +/* + * Must be called in process context. The caller must hold the + * cpus_lock. + */ static int pseries_lpar_resize_hpt(unsigned long shift) { struct hpt_resize_state state = { @@ -1467,7 +1470,8 @@ static int pseries_lpar_resize_hpt(unsig
t1 = ktime_get();
- rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL); + rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit, + &state, NULL);
t2 = ktime_get();
From: Alexey Kardashevskiy aik@ozlabs.ru
commit 56090a3902c80c296e822d11acdb6a101b322c52 upstream.
pnv_tce() returns a pointer to a TCE entry and originally a TCE table would be pre-allocated. For the default case of 2GB window the table needs only a single level and that is fine. However if more levels are requested, it is possible to get a race when 2 threads want a pointer to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero, it cannot become zero again.
Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand") CC: stable@vger.kernel.org # v4.19+ Signed-off-by: Alexey Kardashevskiy aik@ozlabs.ru Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190718051139.74787-2-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c @@ -49,6 +49,9 @@ static __be64 *pnv_alloc_tce_level(int n return addr; }
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr, + unsigned long size, unsigned int levels); + static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc) { __be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base; @@ -58,9 +61,9 @@ static __be64 *pnv_tce(struct iommu_tabl
while (level) { int n = (idx & mask) >> (level * shift); - unsigned long tce; + unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
- if (tmp[n] == 0) { + if (!tce) { __be64 *tmp2;
if (!alloc) @@ -71,10 +74,15 @@ static __be64 *pnv_tce(struct iommu_tabl if (!tmp2) return NULL;
- tmp[n] = cpu_to_be64(__pa(tmp2) | - TCE_PCI_READ | TCE_PCI_WRITE); + tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE; + oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0, + cpu_to_be64(tce))); + if (oldtce) { + pnv_pci_ioda2_table_do_free_pages(tmp2, + ilog2(tbl->it_level_size) + 3, 1); + tce = oldtce; + } } - tce = be64_to_cpu(tmp[n]);
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE)); idx &= ~mask;
From: Christophe Leroy christophe.leroy@c-s.fr
commit 45ff3c55958542c3b76075d59741297b8cb31cbb upstream.
Parallel loading of modules may lead to bad setup of shadow page table entries.
First, lets align modules so that two modules never share the same shadow page.
Second, ensure that two modules cannot allocate two page tables for the same PMD entry at the same time. This is done by using init_mm.page_table_lock in the same way as __pte_alloc_kernel()
Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/c97284f912128cbc3f2fe09d68e90e65fb3e6026.156536187... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/mm/kasan/kasan_init_32.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
--- a/arch/powerpc/mm/kasan/kasan_init_32.c +++ b/arch/powerpc/mm/kasan/kasan_init_32.c @@ -5,6 +5,7 @@ #include <linux/kasan.h> #include <linux/printk.h> #include <linux/memblock.h> +#include <linux/moduleloader.h> #include <linux/sched/task.h> #include <linux/vmalloc.h> #include <asm/pgalloc.h> @@ -46,7 +47,19 @@ static int __ref kasan_init_shadow_page_ kasan_populate_pte(new, PAGE_READONLY); else kasan_populate_pte(new, PAGE_KERNEL_RO); - pmd_populate_kernel(&init_mm, pmd, new); + + smp_wmb(); /* See comment in __pte_alloc */ + + spin_lock(&init_mm.page_table_lock); + /* Has another populated it ? */ + if (likely((void *)pmd_page_vaddr(*pmd) == kasan_early_shadow_pte)) { + pmd_populate_kernel(&init_mm, pmd, new); + new = NULL; + } + spin_unlock(&init_mm.page_table_lock); + + if (new && slab_is_available()) + pte_free_kernel(&init_mm, new); } return 0; } @@ -137,7 +150,11 @@ void __init kasan_init(void) #ifdef CONFIG_MODULES void *module_alloc(unsigned long size) { - void *base = vmalloc_exec(size); + void *base; + + base = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START, VMALLOC_END, + GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, + NUMA_NO_NODE, __builtin_return_address(0));
if (!base) return NULL;
From: Christophe Leroy christophe.leroy@c-s.fr
commit 663c0c9496a69f80011205ba3194049bcafd681d upstream.
When loading modules, from time to time an Oops is encountered during the init of shadow area for globals. This is due to the last page not always being mapped depending on the exact distance between the start and the end of the shadow area and the alignment with the page addresses.
Fix this by aligning the starting address with the page address.
Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Erhard F. erhard_f@mailbox.org Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/4f887e9b77d0d725cbb52035c7ece485c1c5fc14.156536188... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/mm/kasan/kasan_init_32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/powerpc/mm/kasan/kasan_init_32.c +++ b/arch/powerpc/mm/kasan/kasan_init_32.c @@ -87,7 +87,7 @@ static int __ref kasan_init_region(void if (!slab_is_available()) block = memblock_alloc(k_end - k_start, PAGE_SIZE);
- for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) { + for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) { pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), k_cur); void *va = block ? block + k_cur - k_start : kasan_get_one_page(); pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit 677733e296b5c7a37c47da391fc70a43dc40bd67 upstream.
The store ordering vs tlbie issue mentioned in commit a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't need to apply the fixup if we are running on them
We can only do this on PowerNV. On pseries guest with KVM we still don't support redoing the feature fixup after migration. So we should be enabling all the workarounds needed, because whe can possibly migrate between DD 2.3 and DD 2.2
Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/dt_cpu_ftrs.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -691,9 +691,35 @@ static bool __init cpufeatures_process_f return true; }
+/* + * Handle POWER9 broadcast tlbie invalidation issue using + * cpu feature flag. + */ +static __init void update_tlbie_feature_flag(unsigned long pvr) +{ + if (PVR_VER(pvr) == PVR_POWER9) { + /* + * Set the tlbie feature flag for anything below + * Nimbus DD 2.3 and Cumulus DD 1.3 + */ + if ((pvr & 0xe000) == 0) { + /* Nimbus */ + if ((pvr & 0xfff) < 0x203) + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } else if ((pvr & 0xc000) == 0) { + /* Cumulus */ + if ((pvr & 0xfff) < 0x103) + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } else { + WARN_ONCE(1, "Unknown PVR"); + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + } + } +} + static __init void cpufeatures_cpu_quirks(void) { - int version = mfspr(SPRN_PVR); + unsigned long version = mfspr(SPRN_PVR);
/* * Not all quirks can be derived from the cpufeatures device tree. @@ -712,10 +738,10 @@ static __init void cpufeatures_cpu_quirk
if ((version & 0xffff0000) == 0x004e0000) { cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR); - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; cur_cpu_spec->cpu_features |= CPU_FTR_P9_TIDR; }
+ update_tlbie_feature_flag(version); /* * PKEY was not in the initial base or feature node * specification, but it should become optional in the next
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 upstream.
Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue.
Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/cputable.h | 4 ++-- arch/powerpc/kernel/dt_cpu_ftrs.c | 6 +++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- arch/powerpc/mm/book3s64/hash_native.c | 2 +- arch/powerpc/mm/book3s64/radix_tlb.c | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-)
--- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -213,7 +213,7 @@ static inline void cpu_feature_keys_init #define CPU_FTR_POWER9_DD2_1 LONG_ASM_CONST(0x0000080000000000) #define CPU_FTR_P9_TM_HV_ASSIST LONG_ASM_CONST(0x0000100000000000) #define CPU_FTR_P9_TM_XER_SO_BUG LONG_ASM_CONST(0x0000200000000000) -#define CPU_FTR_P9_TLBIE_BUG LONG_ASM_CONST(0x0000400000000000) +#define CPU_FTR_P9_TLBIE_STQ_BUG LONG_ASM_CONST(0x0000400000000000) #define CPU_FTR_P9_TIDR LONG_ASM_CONST(0x0000800000000000)
#ifndef __ASSEMBLY__ @@ -461,7 +461,7 @@ static inline void cpu_feature_keys_init CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \ - CPU_FTR_P9_TLBIE_BUG | CPU_FTR_P9_TIDR) + CPU_FTR_P9_TLBIE_STQ_BUG | CPU_FTR_P9_TIDR) #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9 #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1) #define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1 | \ --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -705,14 +705,14 @@ static __init void update_tlbie_feature_ if ((pvr & 0xe000) == 0) { /* Nimbus */ if ((pvr & 0xfff) < 0x203) - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } else if ((pvr & 0xc000) == 0) { /* Cumulus */ if ((pvr & 0xfff) < 0x103) - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } else { WARN_ONCE(1, "Unknown PVR"); - cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG; + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } } } --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -451,7 +451,7 @@ static void do_tlbies(struct kvm *kvm, u "r" (rbvalues[i]), "r" (kvm->arch.lpid)); }
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { /* * Need the extra ptesync to make sure we don't * re-order the tlbie --- a/arch/powerpc/mm/book3s64/hash_native.c +++ b/arch/powerpc/mm/book3s64/hash_native.c @@ -199,7 +199,7 @@ static inline unsigned long ___tlbie(un
static inline void fixup_tlbie(unsigned long vpn, int psize, int apsize, int ssize) { - if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { /* Need the extra ptesync to ensure we don't reorder tlbie*/ asm volatile("ptesync": : :"memory"); ___tlbie(vpn, psize, apsize, ssize); --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -216,7 +216,7 @@ static inline void fixup_tlbie(void) unsigned long pid = 0; unsigned long va = ((1UL << 52) - 1);
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync": : :"memory"); __tlbie_va(va, pid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); } @@ -226,7 +226,7 @@ static inline void fixup_tlbie_lpid(unsi { unsigned long va = ((1UL << 52) - 1);
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync": : :"memory"); __tlbie_lpid_va(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); }
From: Christophe Leroy christophe.leroy@c-s.fr
commit 4c0f5d1eb4072871c34530358df45f05ab80edd6 upstream.
In a couple of places there is a need to select whether read-only protection of shadow pages is performed with PAGE_KERNEL_RO or with PAGE_READONLY.
Add a helper to avoid duplicating the choice.
Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/9f33f44b9cd741c4a02b3dce7b8ef9438fe2cd2a.156638275... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/mm/kasan/kasan_init_32.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
--- a/arch/powerpc/mm/kasan/kasan_init_32.c +++ b/arch/powerpc/mm/kasan/kasan_init_32.c @@ -12,6 +12,14 @@ #include <asm/code-patching.h> #include <mm/mmu_decl.h>
+static pgprot_t kasan_prot_ro(void) +{ + if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE)) + return PAGE_READONLY; + + return PAGE_KERNEL_RO; +} + static void kasan_populate_pte(pte_t *ptep, pgprot_t prot) { unsigned long va = (unsigned long)kasan_early_shadow_page; @@ -26,6 +34,7 @@ static int __ref kasan_init_shadow_page_ { pmd_t *pmd; unsigned long k_cur, k_next; + pgprot_t prot = kasan_prot_ro();
pmd = pmd_offset(pud_offset(pgd_offset_k(k_start), k_start), k_start);
@@ -43,10 +52,7 @@ static int __ref kasan_init_shadow_page_
if (!new) return -ENOMEM; - if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE)) - kasan_populate_pte(new, PAGE_READONLY); - else - kasan_populate_pte(new, PAGE_KERNEL_RO); + kasan_populate_pte(new, prot);
smp_wmb(); /* See comment in __pte_alloc */
@@ -103,10 +109,9 @@ static int __ref kasan_init_region(void
static void __init kasan_remap_early_shadow_ro(void) { - if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE)) - kasan_populate_pte(kasan_early_shadow_pte, PAGE_READONLY); - else - kasan_populate_pte(kasan_early_shadow_pte, PAGE_KERNEL_RO); + pgprot_t prot = kasan_prot_ro(); + + kasan_populate_pte(kasan_early_shadow_pte, prot);
flush_tlb_kernel_range(KASAN_SHADOW_START, KASAN_SHADOW_END); }
From: Christophe Leroy christophe.leroy@c-s.fr
commit cbd18991e24fea2c31da3bb117c83e4a3538cd11 upstream.
Uncompressing Kernel Image ... OK Loading Device Tree to 01ff7000, end 01fff74f ... OK [ 0.000000] printk: bootconsole [udbg0] enabled [ 0.000000] BUG: Unable to handle kernel data access at 0xf818c000 [ 0.000000] Faulting instruction address: 0xc0013c7c [ 0.000000] Thread overran stack, or stack corrupted [ 0.000000] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.000000] BE PAGE_SIZE=16K PREEMPT [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.3.0-rc4-s3k-dev-00743-g5abe4a3e8fd3-dirty #2080 [ 0.000000] NIP: c0013c7c LR: c0013310 CTR: 00000000 [ 0.000000] REGS: c0c5ff38 TRAP: 0300 Not tainted (5.3.0-rc4-s3k-dev-00743-g5abe4a3e8fd3-dirty) [ 0.000000] MSR: 00001032 <ME,IR,DR,RI> CR: 99033955 XER: 80002100 [ 0.000000] DAR: f818c000 DSISR: 82000000 [ 0.000000] GPR00: c0013310 c0c5fff0 c0ad6ac0 c0c600c0 f818c031 82000000 00000000 ffffffff [ 0.000000] GPR08: 00000000 f1f1f1f1 c0013c2c c0013304 99033955 00400008 00000000 07ff9598 [ 0.000000] GPR16: 00000000 07ffb94c 00000000 00000000 00000000 00000000 00000000 f818cfb2 [ 0.000000] GPR24: 00000000 00000000 00001000 ffffffff 00000000 c07dbf80 00000000 f818c000 [ 0.000000] NIP [c0013c7c] do_page_fault+0x50/0x904 [ 0.000000] LR [c0013310] handle_page_fault+0xc/0x38 [ 0.000000] Call Trace: [ 0.000000] Instruction dump: [ 0.000000] be010080 91410014 553fe8fe 3d40c001 3d20f1f1 7d800026 394a3c2c 3fffe000 [ 0.000000] 6129f1f1 900100c4 9181007c 91410018 <913f0000> 3d2001f4 6129f4f4 913f0004
Don't map the early shadow page read-only yet when creating the new page tables for the real shadow memory, otherwise the memblock allocations that immediately follows to create the real shadow pages that are about to replace the early shadow page trigger a page fault if they fall into the region being worked on at the moment.
Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support") Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/fe86886fb8db44360417cee0dc515ad47ca6ef72.156638275... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/mm/kasan/kasan_init_32.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
--- a/arch/powerpc/mm/kasan/kasan_init_32.c +++ b/arch/powerpc/mm/kasan/kasan_init_32.c @@ -34,7 +34,7 @@ static int __ref kasan_init_shadow_page_ { pmd_t *pmd; unsigned long k_cur, k_next; - pgprot_t prot = kasan_prot_ro(); + pgprot_t prot = slab_is_available() ? kasan_prot_ro() : PAGE_KERNEL;
pmd = pmd_offset(pud_offset(pgd_offset_k(k_start), k_start), k_start);
@@ -110,9 +110,22 @@ static int __ref kasan_init_region(void static void __init kasan_remap_early_shadow_ro(void) { pgprot_t prot = kasan_prot_ro(); + unsigned long k_start = KASAN_SHADOW_START; + unsigned long k_end = KASAN_SHADOW_END; + unsigned long k_cur; + phys_addr_t pa = __pa(kasan_early_shadow_page);
kasan_populate_pte(kasan_early_shadow_pte, prot);
+ for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) { + pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), k_cur); + pte_t *ptep = pte_offset_kernel(pmd, k_cur); + + if ((pte_val(*ptep) & PTE_RPN_MASK) != pa) + continue; + + __set_pte_at(&init_mm, k_cur, ptep, pfn_pte(PHYS_PFN(pa), prot), 0); + } flush_tlb_kernel_range(KASAN_SHADOW_START, KASAN_SHADOW_END); }
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit 047e6575aec71d75b765c22111820c4776cd1c43 upstream.
On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped.
The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic.
We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush.
Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/cputable.h | 3 - arch/powerpc/kernel/dt_cpu_ftrs.c | 2 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 42 +++++++++++++---- arch/powerpc/mm/book3s64/hash_native.c | 29 ++++++++++- arch/powerpc/mm/book3s64/radix_tlb.c | 80 +++++++++++++++++++++++++++++---- 5 files changed, 134 insertions(+), 22 deletions(-)
--- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -215,6 +215,7 @@ static inline void cpu_feature_keys_init #define CPU_FTR_P9_TM_XER_SO_BUG LONG_ASM_CONST(0x0000200000000000) #define CPU_FTR_P9_TLBIE_STQ_BUG LONG_ASM_CONST(0x0000400000000000) #define CPU_FTR_P9_TIDR LONG_ASM_CONST(0x0000800000000000) +#define CPU_FTR_P9_TLBIE_ERAT_BUG LONG_ASM_CONST(0x0001000000000000)
#ifndef __ASSEMBLY__
@@ -461,7 +462,7 @@ static inline void cpu_feature_keys_init CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \ - CPU_FTR_P9_TLBIE_STQ_BUG | CPU_FTR_P9_TIDR) + CPU_FTR_P9_TLBIE_STQ_BUG | CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR) #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9 #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1) #define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1 | \ --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -714,6 +714,8 @@ static __init void update_tlbie_feature_ WARN_ONCE(1, "Unknown PVR"); cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_STQ_BUG; } + + cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_ERAT_BUG; } }
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -433,6 +433,37 @@ static inline int is_mmio_hpte(unsigned (HPTE_R_KEY_HI | HPTE_R_KEY_LO)); }
+static inline void fixup_tlbie_lpid(unsigned long rb_value, unsigned long lpid) +{ + + if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) { + /* Radix flush for a hash guest */ + + unsigned long rb,rs,prs,r,ric; + + rb = PPC_BIT(52); /* IS = 2 */ + rs = 0; /* lpid = 0 */ + prs = 0; /* partition scoped */ + r = 1; /* radix format */ + ric = 0; /* RIC_FLSUH_TLB */ + + /* + * Need the extra ptesync to make sure we don't + * re-order the tlbie + */ + asm volatile("ptesync": : :"memory"); + asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1) + : : "r"(rb), "i"(r), "i"(prs), + "i"(ric), "r"(rs) : "memory"); + } + + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { + asm volatile("ptesync": : :"memory"); + asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : : + "r" (rb_value), "r" (lpid)); + } +} + static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, long npages, int global, bool need_sync) { @@ -451,16 +482,7 @@ static void do_tlbies(struct kvm *kvm, u "r" (rbvalues[i]), "r" (kvm->arch.lpid)); }
- if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { - /* - * Need the extra ptesync to make sure we don't - * re-order the tlbie - */ - asm volatile("ptesync": : :"memory"); - asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : : - "r" (rbvalues[0]), "r" (kvm->arch.lpid)); - } - + fixup_tlbie_lpid(rbvalues[i - 1], kvm->arch.lpid); asm volatile("eieio; tlbsync; ptesync" : : : "memory"); } else { if (need_sync) --- a/arch/powerpc/mm/book3s64/hash_native.c +++ b/arch/powerpc/mm/book3s64/hash_native.c @@ -197,8 +197,31 @@ static inline unsigned long ___tlbie(un return va; }
-static inline void fixup_tlbie(unsigned long vpn, int psize, int apsize, int ssize) +static inline void fixup_tlbie_vpn(unsigned long vpn, int psize, + int apsize, int ssize) { + if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) { + /* Radix flush for a hash guest */ + + unsigned long rb,rs,prs,r,ric; + + rb = PPC_BIT(52); /* IS = 2 */ + rs = 0; /* lpid = 0 */ + prs = 0; /* partition scoped */ + r = 1; /* radix format */ + ric = 0; /* RIC_FLSUH_TLB */ + + /* + * Need the extra ptesync to make sure we don't + * re-order the tlbie + */ + asm volatile("ptesync": : :"memory"); + asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1) + : : "r"(rb), "i"(r), "i"(prs), + "i"(ric), "r"(rs) : "memory"); + } + + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { /* Need the extra ptesync to ensure we don't reorder tlbie*/ asm volatile("ptesync": : :"memory"); @@ -283,7 +306,7 @@ static inline void tlbie(unsigned long v asm volatile("ptesync": : :"memory"); } else { __tlbie(vpn, psize, apsize, ssize); - fixup_tlbie(vpn, psize, apsize, ssize); + fixup_tlbie_vpn(vpn, psize, apsize, ssize); asm volatile("eieio; tlbsync; ptesync": : :"memory"); } if (lock_tlbie && !use_local) @@ -856,7 +879,7 @@ static void native_flush_hash_range(unsi /* * Just do one more with the last used values. */ - fixup_tlbie(vpn, psize, psize, ssize); + fixup_tlbie_vpn(vpn, psize, psize, ssize); asm volatile("eieio; tlbsync; ptesync":::"memory");
if (lock_tlbie) --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -211,21 +211,82 @@ static __always_inline void __tlbie_lpid trace_tlbie(lpid, 0, rb, rs, ric, prs, r); }
-static inline void fixup_tlbie(void) + +static inline void fixup_tlbie_va(unsigned long va, unsigned long pid, + unsigned long ap) { - unsigned long pid = 0; + if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) { + asm volatile("ptesync": : :"memory"); + __tlbie_va(va, 0, ap, RIC_FLUSH_TLB); + } + + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { + asm volatile("ptesync": : :"memory"); + __tlbie_va(va, pid, ap, RIC_FLUSH_TLB); + } +} + +static inline void fixup_tlbie_va_range(unsigned long va, unsigned long pid, + unsigned long ap) +{ + if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) { + asm volatile("ptesync": : :"memory"); + __tlbie_pid(0, RIC_FLUSH_TLB); + } + + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { + asm volatile("ptesync": : :"memory"); + __tlbie_va(va, pid, ap, RIC_FLUSH_TLB); + } +} + +static inline void fixup_tlbie_pid(unsigned long pid) +{ + /* + * We can use any address for the invalidation, pick one which is + * probably unused as an optimisation. + */ unsigned long va = ((1UL << 52) - 1);
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) { + asm volatile("ptesync": : :"memory"); + __tlbie_pid(0, RIC_FLUSH_TLB); + } + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync": : :"memory"); __tlbie_va(va, pid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); } }
+ +static inline void fixup_tlbie_lpid_va(unsigned long va, unsigned long lpid, + unsigned long ap) +{ + if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) { + asm volatile("ptesync": : :"memory"); + __tlbie_lpid_va(va, 0, ap, RIC_FLUSH_TLB); + } + + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { + asm volatile("ptesync": : :"memory"); + __tlbie_lpid_va(va, lpid, ap, RIC_FLUSH_TLB); + } +} + static inline void fixup_tlbie_lpid(unsigned long lpid) { + /* + * We can use any address for the invalidation, pick one which is + * probably unused as an optimisation. + */ unsigned long va = ((1UL << 52) - 1);
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) { + asm volatile("ptesync": : :"memory"); + __tlbie_lpid(0, RIC_FLUSH_TLB); + } + if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) { asm volatile("ptesync": : :"memory"); __tlbie_lpid_va(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB); @@ -273,6 +334,7 @@ static inline void _tlbie_pid(unsigned l switch (ric) { case RIC_FLUSH_TLB: __tlbie_pid(pid, RIC_FLUSH_TLB); + fixup_tlbie_pid(pid); break; case RIC_FLUSH_PWC: __tlbie_pid(pid, RIC_FLUSH_PWC); @@ -280,8 +342,8 @@ static inline void _tlbie_pid(unsigned l case RIC_FLUSH_ALL: default: __tlbie_pid(pid, RIC_FLUSH_ALL); + fixup_tlbie_pid(pid); } - fixup_tlbie(); asm volatile("eieio; tlbsync; ptesync": : :"memory"); }
@@ -325,6 +387,7 @@ static inline void _tlbie_lpid(unsigned switch (ric) { case RIC_FLUSH_TLB: __tlbie_lpid(lpid, RIC_FLUSH_TLB); + fixup_tlbie_lpid(lpid); break; case RIC_FLUSH_PWC: __tlbie_lpid(lpid, RIC_FLUSH_PWC); @@ -332,8 +395,8 @@ static inline void _tlbie_lpid(unsigned case RIC_FLUSH_ALL: default: __tlbie_lpid(lpid, RIC_FLUSH_ALL); + fixup_tlbie_lpid(lpid); } - fixup_tlbie_lpid(lpid); asm volatile("eieio; tlbsync; ptesync": : :"memory"); }
@@ -407,6 +470,8 @@ static inline void __tlbie_va_range(unsi
for (addr = start; addr < end; addr += page_size) __tlbie_va(addr, pid, ap, RIC_FLUSH_TLB); + + fixup_tlbie_va_range(addr - page_size, pid, ap); }
static __always_inline void _tlbie_va(unsigned long va, unsigned long pid, @@ -416,7 +481,7 @@ static __always_inline void _tlbie_va(un
asm volatile("ptesync": : :"memory"); __tlbie_va(va, pid, ap, ric); - fixup_tlbie(); + fixup_tlbie_va(va, pid, ap); asm volatile("eieio; tlbsync; ptesync": : :"memory"); }
@@ -427,7 +492,7 @@ static __always_inline void _tlbie_lpid_
asm volatile("ptesync": : :"memory"); __tlbie_lpid_va(va, lpid, ap, ric); - fixup_tlbie_lpid(lpid); + fixup_tlbie_lpid_va(va, lpid, ap); asm volatile("eieio; tlbsync; ptesync": : :"memory"); }
@@ -439,7 +504,6 @@ static inline void _tlbie_va_range(unsig if (also_pwc) __tlbie_pid(pid, RIC_FLUSH_PWC); __tlbie_va_range(start, end, pid, page_size, psize); - fixup_tlbie(); asm volatile("eieio; tlbsync; ptesync": : :"memory"); }
@@ -775,7 +839,7 @@ is_local: if (gflush) __tlbie_va_range(gstart, gend, pid, PUD_SIZE, MMU_PAGE_1G); - fixup_tlbie(); + asm volatile("eieio; tlbsync; ptesync": : :"memory"); } }
From: Marc Kleine-Budde mkl@pengutronix.de
commit d84ea2123f8d27144e3f4d58cd88c9c6ddc799de upstream.
Some boards take longer than 5ms to power up after a reset, so allow some retries attempts before giving up.
Fixes: ff06d611a31c ("can: mcp251x: Improve mcp251x_hw_reset()") Cc: linux-stable stable@vger.kernel.org Tested-by: Sean Nyekjaer sean@geanix.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/spi/mcp251x.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
--- a/drivers/net/can/spi/mcp251x.c +++ b/drivers/net/can/spi/mcp251x.c @@ -612,7 +612,7 @@ static int mcp251x_setup(struct net_devi static int mcp251x_hw_reset(struct spi_device *spi) { struct mcp251x_priv *priv = spi_get_drvdata(spi); - u8 reg; + unsigned long timeout; int ret;
/* Wait for oscillator startup timer after power up */ @@ -626,10 +626,19 @@ static int mcp251x_hw_reset(struct spi_d /* Wait for oscillator startup timer after reset */ mdelay(MCP251X_OST_DELAY_MS);
- reg = mcp251x_read_reg(spi, CANSTAT); - if ((reg & CANCTRL_REQOP_MASK) != CANCTRL_REQOP_CONF) - return -ENODEV; - + /* Wait for reset to finish */ + timeout = jiffies + HZ; + while ((mcp251x_read_reg(spi, CANSTAT) & CANCTRL_REQOP_MASK) != + CANCTRL_REQOP_CONF) { + usleep_range(MCP251X_OST_DELAY_MS * 1000, + MCP251X_OST_DELAY_MS * 1000 * 2); + + if (time_after(jiffies, timeout)) { + dev_err(&spi->dev, + "MCP251x didn't enter in conf mode after reset\n"); + return -EBUSY; + } + } return 0; }
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 82a2f88458d70704be843961e10b5cef9a6e95d3 upstream.
The tools/lib/traceevent/Makefile had a test added to it to detect a failure of the "nm" when making the dynamic list file (whatever that is). The problem is that the test sorts the values "U W w" and some versions of sort will place "w" ahead of "W" (even though it has a higher ASCII value, and break the test.
Add 'tr "w" "W"' to merge the two and not worry about the ordering.
Reported-by: Tzvetomir Stoyanov tstoyanov@vmware.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: David Carrillo-Cisneros davidcc@google.com Cc: He Kuang hekuang@huawei.com Cc: Jiri Olsa jolsa@kernel.org Cc: Michal rarek mmarek@suse.com Cc: Paul Turner pjt@google.com Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Uwe Kleine-König u.kleine-koenig@pengutronix.de Cc: Wang Nan wangnan0@huawei.com Cc: stable@vger.kernel.org Fixes: 6467753d61399 ("tools lib traceevent: Robustify do_generate_dynamic_list_file") Link: http://lkml.kernel.org/r/20190805130150.25acfeb1@gandalf.local.home Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/lib/traceevent/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/lib/traceevent/Makefile +++ b/tools/lib/traceevent/Makefile @@ -266,8 +266,8 @@ endef
define do_generate_dynamic_list_file symbol_type=`$(NM) -u -D $1 | awk 'NF>1 {print $$1}' | \ - xargs echo "U W w" | tr ' ' '\n' | sort -u | xargs echo`;\ - if [ "$$symbol_type" = "U W w" ];then \ + xargs echo "U w W" | tr 'w ' 'W\n' | sort -u | xargs echo`;\ + if [ "$$symbol_type" = "U W" ];then \ (echo '{'; \ $(NM) -u -D $1 | awk 'NF>1 {print "\t"$$2";"}' | sort -u;\ echo '};'; \
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit b0215e2d6a18d8331b2d4a8b38ccf3eff783edb1 upstream.
If the re-allocation of tep->cmdlines succeeds, then the previous allocation of tep->cmdlines will be freed. If we later fail in add_new_comm(), we must not free cmdlines, and also should assign tep->cmdlines to the new allocation. Otherwise when freeing tep, the tep->cmdlines will be pointing to garbage.
Fixes: a6d2a61ac653a ("tools lib traceevent: Remove some die() calls") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Jiri Olsa jolsa@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190828191819.970121417@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/lib/traceevent/event-parse.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -269,10 +269,10 @@ static int add_new_comm(struct tep_handl errno = ENOMEM; return -1; } + tep->cmdlines = cmdlines;
cmdlines[tep->cmdline_count].comm = strdup(comm); if (!cmdlines[tep->cmdline_count].comm) { - free(cmdlines); errno = ENOMEM; return -1; } @@ -283,7 +283,6 @@ static int add_new_comm(struct tep_handl tep->cmdline_count++;
qsort(cmdlines, tep->cmdline_count, sizeof(*cmdlines), cmdline_cmp); - tep->cmdlines = cmdlines;
return 0; }
From: Alexander Sverdlin alexander.sverdlin@nokia.com
commit 1b82feb6c5e1996513d0fb0bbb475417088b4954 upstream.
It seems that smp_processor_id() is only used for a best-effort load-balancing, refer to qat_crypto_get_instance_node(). It's not feasible to disable preemption for the duration of the crypto requests. Therefore, just silence the warning. This commit is similar to e7a9b05ca4 ("crypto: cavium - Fix smp_processor_id() warnings").
Silences the following splat: BUG: using smp_processor_id() in preemptible [00000000] code: cryptomgr_test/2904 caller is qat_alg_ablkcipher_setkey+0x300/0x4a0 [intel_qat] CPU: 1 PID: 2904 Comm: cryptomgr_test Tainted: P O 4.14.69 #1 ... Call Trace: dump_stack+0x5f/0x86 check_preemption_disabled+0xd3/0xe0 qat_alg_ablkcipher_setkey+0x300/0x4a0 [intel_qat] skcipher_setkey_ablkcipher+0x2b/0x40 __test_skcipher+0x1f3/0xb20 ? cpumask_next_and+0x26/0x40 ? find_busiest_group+0x10e/0x9d0 ? preempt_count_add+0x49/0xa0 ? try_module_get+0x61/0xf0 ? crypto_mod_get+0x15/0x30 ? __kmalloc+0x1df/0x1f0 ? __crypto_alloc_tfm+0x116/0x180 ? crypto_skcipher_init_tfm+0xa6/0x180 ? crypto_create_tfm+0x4b/0xf0 test_skcipher+0x21/0xa0 alg_test_skcipher+0x3f/0xa0 alg_test.part.6+0x126/0x2a0 ? finish_task_switch+0x21b/0x260 ? __schedule+0x1e9/0x800 ? __wake_up_common+0x8d/0x140 cryptomgr_test+0x40/0x50 kthread+0xff/0x130 ? cryptomgr_notify+0x540/0x540 ? kthread_create_on_node+0x70/0x70 ret_from_fork+0x24/0x50
Fixes: ed8ccaef52 ("crypto: qat - Add support for SRIOV") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin alexander.sverdlin@nokia.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/qat/qat_common/adf_common_drv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/crypto/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/qat/qat_common/adf_common_drv.h @@ -95,7 +95,7 @@ struct service_hndl {
static inline int get_current_node(void) { - return topology_physical_package_id(smp_processor_id()); + return topology_physical_package_id(raw_smp_processor_id()); }
int adf_service_register(struct service_hndl *service);
From: Herbert Xu herbert@gondor.apana.org.au
commit 0ba3c026e685573bd3534c17e27da7c505ac99c4 upstream.
skcipher_walk_done may be called with an error by internal or external callers. For those internal callers we shouldn't unmap pages but for external callers we must unmap any pages that are in use.
This patch distinguishes between the two cases by checking whether walk->nbytes is zero or not. For internal callers, we now set walk->nbytes to zero prior to the call. For external callers, walk->nbytes has always been non-zero (as zero is used to indicate the termination of a walk).
Reported-by: Ard Biesheuvel ard.biesheuvel@linaro.org Fixes: 5cde0af2a982 ("[CRYPTO] cipher: Added block cipher type") Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/skcipher.c | 42 +++++++++++++++++++++++------------------- 1 file changed, 23 insertions(+), 19 deletions(-)
--- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -90,7 +90,7 @@ static inline u8 *skcipher_get_spot(u8 * return max(start, end_page); }
-static void skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize) +static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize) { u8 *addr;
@@ -98,19 +98,21 @@ static void skcipher_done_slow(struct sk addr = skcipher_get_spot(addr, bsize); scatterwalk_copychunks(addr, &walk->out, bsize, (walk->flags & SKCIPHER_WALK_PHYS) ? 2 : 1); + return 0; }
int skcipher_walk_done(struct skcipher_walk *walk, int err) { - unsigned int n; /* bytes processed */ - bool more; + unsigned int n = walk->nbytes; + unsigned int nbytes = 0;
- if (unlikely(err < 0)) + if (!n) goto finish;
- n = walk->nbytes - err; - walk->total -= n; - more = (walk->total != 0); + if (likely(err >= 0)) { + n -= err; + nbytes = walk->total - n; + }
if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS | SKCIPHER_WALK_SLOW | @@ -126,7 +128,7 @@ unmap_src: memcpy(walk->dst.virt.addr, walk->page, n); skcipher_unmap_dst(walk); } else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) { - if (err) { + if (err > 0) { /* * Didn't process all bytes. Either the algorithm is * broken, or this was the last step and it turned out @@ -134,27 +136,29 @@ unmap_src: * the algorithm requires it. */ err = -EINVAL; - goto finish; - } - skcipher_done_slow(walk, n); - goto already_advanced; + nbytes = 0; + } else + n = skcipher_done_slow(walk, n); }
+ if (err > 0) + err = 0; + + walk->total = nbytes; + walk->nbytes = 0; + scatterwalk_advance(&walk->in, n); scatterwalk_advance(&walk->out, n); -already_advanced: - scatterwalk_done(&walk->in, 0, more); - scatterwalk_done(&walk->out, 1, more); + scatterwalk_done(&walk->in, 0, nbytes); + scatterwalk_done(&walk->out, 1, nbytes);
- if (more) { + if (nbytes) { crypto_yield(walk->flags & SKCIPHER_WALK_SLEEP ? CRYPTO_TFM_REQ_MAY_SLEEP : 0); return skcipher_walk_next(walk); } - err = 0; -finish: - walk->nbytes = 0;
+finish: /* Short-circuit for the common/fast path. */ if (!((unsigned long)walk->buffer | (unsigned long)walk->page)) goto out;
From: Wei Yongjun weiyongjun1@huawei.com
commit c552ffb5c93d9d65aaf34f5f001c4e7e8484ced1 upstream.
When using single_open() for opening, single_release() should be used instead of seq_release(), otherwise there is a memory leak.
Fixes: 09ae5d37e093 ("crypto: zip - Add Compression/Decompression statistics") Cc: stable@vger.kernel.org Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/cavium/zip/zip_main.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/crypto/cavium/zip/zip_main.c +++ b/drivers/crypto/cavium/zip/zip_main.c @@ -593,6 +593,7 @@ static const struct file_operations zip_ .owner = THIS_MODULE, .open = zip_stats_open, .read = seq_read, + .release = single_release, };
static int zip_clear_open(struct inode *inode, struct file *file) @@ -604,6 +605,7 @@ static const struct file_operations zip_ .owner = THIS_MODULE, .open = zip_clear_open, .read = seq_read, + .release = single_release, };
static int zip_regs_open(struct inode *inode, struct file *file) @@ -615,6 +617,7 @@ static const struct file_operations zip_ .owner = THIS_MODULE, .open = zip_regs_open, .read = seq_read, + .release = single_release, };
/* Root directory for thunderx_zip debugfs entry */
From: Horia Geantă horia.geanta@nxp.com
commit 51fab3d73054ca5b06b26e20edac0486b052c6f4 upstream.
ERN handler calls the caam/qi frontend "done" callback with a status of -EIO. This is incorrect, since the callback expects a status value meaningful for the crypto engine - hence the cryptic messages like the one below: platform caam_qi: 15: unknown error source
Fix this by providing the callback with: -the status returned by the crypto engine (fd[status]) in case it contains an error, OR -a QI "No error" code otherwise; this will trigger the message: platform caam_qi: 50000000: Queue Manager Interface: No error which is fine, since QMan driver provides details about the cause of failure
Cc: stable@vger.kernel.org # v5.1+ Fixes: 67c2315def06 ("crypto: caam - add Queue Interface (QI) backend support") Signed-off-by: Horia Geantă horia.geanta@nxp.com Reviewed-by: Iuliana Prodan iuliana.prodan@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/caam/error.c | 1 + drivers/crypto/caam/qi.c | 5 ++++- drivers/crypto/caam/regs.h | 1 + 3 files changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/crypto/caam/error.c +++ b/drivers/crypto/caam/error.c @@ -118,6 +118,7 @@ static const struct { u8 value; const char *error_text; } qi_error_list[] = { + { 0x00, "No error" }, { 0x1F, "Job terminated by FQ or ICID flush" }, { 0x20, "FD format error"}, { 0x21, "FD command format error"}, --- a/drivers/crypto/caam/qi.c +++ b/drivers/crypto/caam/qi.c @@ -163,7 +163,10 @@ static void caam_fq_ern_cb(struct qman_p dma_unmap_single(drv_req->drv_ctx->qidev, qm_fd_addr(fd), sizeof(drv_req->fd_sgt), DMA_BIDIRECTIONAL);
- drv_req->cbk(drv_req, -EIO); + if (fd->status) + drv_req->cbk(drv_req, be32_to_cpu(fd->status)); + else + drv_req->cbk(drv_req, JRSTA_SSRC_QI); }
static struct qman_fq *create_caam_req_fq(struct device *qidev, --- a/drivers/crypto/caam/regs.h +++ b/drivers/crypto/caam/regs.h @@ -641,6 +641,7 @@ struct caam_job_ring { #define JRSTA_SSRC_CCB_ERROR 0x20000000 #define JRSTA_SSRC_JUMP_HALT_USER 0x30000000 #define JRSTA_SSRC_DECO 0x40000000 +#define JRSTA_SSRC_QI 0x50000000 #define JRSTA_SSRC_JRERROR 0x60000000 #define JRSTA_SSRC_JUMP_HALT_CC 0x70000000
From: Horia Geantă horia.geanta@nxp.com
commit 48f89d2a2920166c35b1c0b69917dbb0390ebec7 upstream.
IV transfer from ofifo to class2 (set up at [29][30]) is not guaranteed to be scheduled before the data transfer from ofifo to external memory (set up at [38]:
[29] 10FA0004 ld: ind-nfifo (len=4) imm [30] 81F00010 <nfifo_entry: ofifo->class2 type=msg len=16> [31] 14820004 ld: ccb2-datasz len=4 offs=0 imm [32] 00000010 data:0x00000010 [33] 8210010D operation: cls1-op aes cbc init-final enc [34] A8080B04 math: (seqin + math0)->vseqout len=4 [35] 28000010 seqfifold: skip len=16 [36] A8080A04 math: (seqin + math0)->vseqin len=4 [37] 2F1E0000 seqfifold: both msg1->2-last2-last1 len=vseqinsz [38] 69300000 seqfifostr: msg len=vseqoutsz [39] 5C20000C seqstr: ccb2 ctx len=12 offs=0
If ofifo -> external memory transfer happens first, DECO will hang (issuing a Watchdog Timeout error, if WDOG is enabled) waiting for data availability in ofifo for the ofifo -> c2 ififo transfer.
Make sure IV transfer happens first by waiting for all CAAM internal transfers to end before starting payload transfer.
New descriptor with jump command inserted at [37]:
[..] [36] A8080A04 math: (seqin + math0)->vseqin len=4 [37] A1000401 jump: jsl1 all-match[!nfifopend] offset=[01] local->[38] [38] 2F1E0000 seqfifold: both msg1->2-last2-last1 len=vseqinsz [39] 69300000 seqfifostr: msg len=vseqoutsz [40] 5C20000C seqstr: ccb2 ctx len=12 offs=0
[Note: the issue is present in the descriptor from the very beginning (cf. Fixes tag). However I've marked it v4.19+ since it's the oldest maintained kernel that the patch applies clean against.]
Cc: stable@vger.kernel.org # v4.19+ Fixes: 1acebad3d8db8 ("crypto: caam - faster aead implementation") Signed-off-by: Horia Geantă horia.geanta@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/caam/caamalg_desc.c | 9 +++++++++ drivers/crypto/caam/caamalg_desc.h | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-)
--- a/drivers/crypto/caam/caamalg_desc.c +++ b/drivers/crypto/caam/caamalg_desc.c @@ -503,6 +503,7 @@ void cnstr_shdsc_aead_givencap(u32 * con const bool is_qi, int era) { u32 geniv, moveiv; + u32 *wait_cmd;
/* Note: Context registers are saved. */ init_sh_desc_key_aead(desc, cdata, adata, is_rfc3686, nonce, era); @@ -598,6 +599,14 @@ copy_iv:
/* Will read cryptlen */ append_math_add(desc, VARSEQINLEN, SEQINLEN, REG0, CAAM_CMD_SZ); + + /* + * Wait for IV transfer (ofifo -> class2) to finish before starting + * ciphertext transfer (ofifo -> external memory). + */ + wait_cmd = append_jump(desc, JUMP_JSL | JUMP_TEST_ALL | JUMP_COND_NIFP); + set_jump_tgt_here(desc, wait_cmd); + append_seq_fifo_load(desc, 0, FIFOLD_CLASS_BOTH | KEY_VLF | FIFOLD_TYPE_MSG1OUT2 | FIFOLD_TYPE_LASTBOTH); append_seq_fifo_store(desc, 0, FIFOST_TYPE_MESSAGE_DATA | KEY_VLF); --- a/drivers/crypto/caam/caamalg_desc.h +++ b/drivers/crypto/caam/caamalg_desc.h @@ -12,7 +12,7 @@ #define DESC_AEAD_BASE (4 * CAAM_CMD_SZ) #define DESC_AEAD_ENC_LEN (DESC_AEAD_BASE + 11 * CAAM_CMD_SZ) #define DESC_AEAD_DEC_LEN (DESC_AEAD_BASE + 15 * CAAM_CMD_SZ) -#define DESC_AEAD_GIVENC_LEN (DESC_AEAD_ENC_LEN + 7 * CAAM_CMD_SZ) +#define DESC_AEAD_GIVENC_LEN (DESC_AEAD_ENC_LEN + 8 * CAAM_CMD_SZ) #define DESC_QI_AEAD_ENC_LEN (DESC_AEAD_ENC_LEN + 3 * CAAM_CMD_SZ) #define DESC_QI_AEAD_DEC_LEN (DESC_AEAD_DEC_LEN + 3 * CAAM_CMD_SZ) #define DESC_QI_AEAD_GIVENC_LEN (DESC_AEAD_GIVENC_LEN + 3 * CAAM_CMD_SZ)
From: Gilad Ben-Yossef gilad@benyossef.com
commit 76a95bd8f9e10cade9c4c8df93b5c20ff45dc0f5 upstream.
When ccree driver runs it checks the state of the Trusted Execution Environment CryptoCell driver before proceeding. We did not account for cases where the TEE side is not ready or not available at all. Fix it by only considering TEE error state after sync with the TEE side driver.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Fixes: ab8ec9658f5a ("crypto: ccree - add FIPS support") CC: stable@vger.kernel.org # v4.17+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_fips.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/crypto/ccree/cc_fips.c +++ b/drivers/crypto/ccree/cc_fips.c @@ -21,7 +21,13 @@ static bool cc_get_tee_fips_status(struc u32 reg;
reg = cc_ioread(drvdata, CC_REG(GPR_HOST)); - return (reg == (CC_FIPS_SYNC_TEE_STATUS | CC_FIPS_SYNC_MODULE_OK)); + /* Did the TEE report status? */ + if (reg & CC_FIPS_SYNC_TEE_STATUS) + /* Yes. Is it OK? */ + return (reg & CC_FIPS_SYNC_MODULE_OK); + + /* No. It's either not in use or will be reported later */ + return true; }
/*
From: Gilad Ben-Yossef gilad@benyossef.com
commit 7a4be6c113c1f721818d1e3722a9015fe393295c upstream.
In case of AEAD decryption verifcation error we were using the wrong value to zero out the plaintext buffer leaving the end of the buffer with the false plaintext.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Fixes: ff27e85a85bb ("crypto: ccree - add AEAD support") CC: stable@vger.kernel.org # v4.17+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_aead.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/crypto/ccree/cc_aead.c +++ b/drivers/crypto/ccree/cc_aead.c @@ -236,7 +236,7 @@ static void cc_aead_complete(struct devi /* In case of payload authentication failure, MUST NOT * revealed the decrypted message --> zero its memory. */ - cc_zero_sgl(areq->dst, areq_ctx->cryptlen); + cc_zero_sgl(areq->dst, areq->cryptlen); err = -EBADMSG; } } else { /*ENCRYPT*/
From: Jiaxun Yang jiaxun.yang@flygoat.com
commit d2f965549006acb865c4638f1f030ebcefdc71f6 upstream.
Recently, binutils had split Loongson-3 Extensions into four ASEs: MMI, CAM, EXT, EXT2. This patch do the samething in kernel and expose them in cpuinfo so applications can probe supported ASEs at runtime.
Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Cc: Huacai Chen chenhc@lemote.com Cc: Yunqiang Su ysu@wavecomp.com Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Paul Burton paul.burton@mips.com Cc: linux-mips@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/include/asm/cpu-features.h | 16 ++++++++++++++++ arch/mips/include/asm/cpu.h | 4 ++++ arch/mips/kernel/cpu-probe.c | 6 ++++++ arch/mips/kernel/proc.c | 4 ++++ 4 files changed, 30 insertions(+)
--- a/arch/mips/include/asm/cpu-features.h +++ b/arch/mips/include/asm/cpu-features.h @@ -397,6 +397,22 @@ #define cpu_has_dsp3 __ase(MIPS_ASE_DSP3) #endif
+#ifndef cpu_has_loongson_mmi +#define cpu_has_loongson_mmi __ase(MIPS_ASE_LOONGSON_MMI) +#endif + +#ifndef cpu_has_loongson_cam +#define cpu_has_loongson_cam __ase(MIPS_ASE_LOONGSON_CAM) +#endif + +#ifndef cpu_has_loongson_ext +#define cpu_has_loongson_ext __ase(MIPS_ASE_LOONGSON_EXT) +#endif + +#ifndef cpu_has_loongson_ext2 +#define cpu_has_loongson_ext2 __ase(MIPS_ASE_LOONGSON_EXT2) +#endif + #ifndef cpu_has_mipsmt #define cpu_has_mipsmt __isa_lt_and_ase(6, MIPS_ASE_MIPSMT) #endif --- a/arch/mips/include/asm/cpu.h +++ b/arch/mips/include/asm/cpu.h @@ -433,5 +433,9 @@ enum cpu_type_enum { #define MIPS_ASE_MSA 0x00000100 /* MIPS SIMD Architecture */ #define MIPS_ASE_DSP3 0x00000200 /* Signal Processing ASE Rev 3*/ #define MIPS_ASE_MIPS16E2 0x00000400 /* MIPS16e2 */ +#define MIPS_ASE_LOONGSON_MMI 0x00000800 /* Loongson MultiMedia extensions Instructions */ +#define MIPS_ASE_LOONGSON_CAM 0x00001000 /* Loongson CAM */ +#define MIPS_ASE_LOONGSON_EXT 0x00002000 /* Loongson EXTensions */ +#define MIPS_ASE_LOONGSON_EXT2 0x00004000 /* Loongson EXTensions R2 */
#endif /* _ASM_CPU_H */ --- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -1573,6 +1573,8 @@ static inline void cpu_probe_legacy(stru __cpu_name[cpu] = "ICT Loongson-3"; set_elf_platform(cpu, "loongson3a"); set_isa(c, MIPS_CPU_ISA_M64R1); + c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM | + MIPS_ASE_LOONGSON_EXT); break; case PRID_REV_LOONGSON3B_R1: case PRID_REV_LOONGSON3B_R2: @@ -1580,6 +1582,8 @@ static inline void cpu_probe_legacy(stru __cpu_name[cpu] = "ICT Loongson-3"; set_elf_platform(cpu, "loongson3b"); set_isa(c, MIPS_CPU_ISA_M64R1); + c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM | + MIPS_ASE_LOONGSON_EXT); break; }
@@ -1946,6 +1950,8 @@ static inline void cpu_probe_loongson(st decode_configs(c); c->options |= MIPS_CPU_FTLB | MIPS_CPU_TLBINV | MIPS_CPU_LDPTE; c->writecombine = _CACHE_UNCACHED_ACCELERATED; + c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM | + MIPS_ASE_LOONGSON_EXT | MIPS_ASE_LOONGSON_EXT2); break; default: panic("Unknown Loongson Processor ID!"); --- a/arch/mips/kernel/proc.c +++ b/arch/mips/kernel/proc.c @@ -124,6 +124,10 @@ static int show_cpuinfo(struct seq_file if (cpu_has_eva) seq_printf(m, "%s", " eva"); if (cpu_has_htw) seq_printf(m, "%s", " htw"); if (cpu_has_xpa) seq_printf(m, "%s", " xpa"); + if (cpu_has_loongson_mmi) seq_printf(m, "%s", " loongson-mmi"); + if (cpu_has_loongson_cam) seq_printf(m, "%s", " loongson-cam"); + if (cpu_has_loongson_ext) seq_printf(m, "%s", " loongson-ext"); + if (cpu_has_loongson_ext2) seq_printf(m, "%s", " loongson-ext2"); seq_printf(m, "\n");
if (cpu_has_mmips) {
From: Michael Nosthoff committed@heine.so
commit 99956a9e08251a1234434b492875b1eaff502a12 upstream.
the type flag is stored in the chip->flags field not in the client->flags field. This currently leads to never using the ti specific health function as client->flags doesn't use that bit. So it's always falling back to the general one.
Fixes: 76b16f4cdfb8 ("power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats") Cc: stable@vger.kernel.org Signed-off-by: Michael Nosthoff committed@heine.so Reviewed-by: Brian Norris briannorris@chromium.org Reviewed-by: Enric Balletbo i Serra enric.balletbo@collabora.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/power/supply/sbs-battery.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/power/supply/sbs-battery.c +++ b/drivers/power/supply/sbs-battery.c @@ -620,7 +620,7 @@ static int sbs_get_property(struct power switch (psp) { case POWER_SUPPLY_PROP_PRESENT: case POWER_SUPPLY_PROP_HEALTH: - if (client->flags & SBS_FLAGS_TI_BQ20Z75) + if (chip->flags & SBS_FLAGS_TI_BQ20Z75) ret = sbs_get_ti_battery_presence_and_health(client, psp, val); else
From: Michael Nosthoff committed@heine.so
commit fe55e770327363304c4111423e6f7ff3c650136d upstream.
when the battery is set to sbs-mode and no gpio detection is enabled "health" is always returning a value even when the battery is not present. All other fields return "not present". This leads to a scenario where the driver is constantly switching between "present" and "not present" state. This generates a lot of constant traffic on the i2c.
This commit changes the response of "health" to an error when the battery is not responding leading to a consistent "not present" state.
Fixes: 76b16f4cdfb8 ("power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats") Cc: stable@vger.kernel.org Signed-off-by: Michael Nosthoff committed@heine.so Reviewed-by: Brian Norris briannorris@chromium.org Tested-by: Brian Norris briannorris@chromium.org Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/power/supply/sbs-battery.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-)
--- a/drivers/power/supply/sbs-battery.c +++ b/drivers/power/supply/sbs-battery.c @@ -314,17 +314,22 @@ static int sbs_get_battery_presence_and_ { int ret;
- if (psp == POWER_SUPPLY_PROP_PRESENT) { - /* Dummy command; if it succeeds, battery is present. */ - ret = sbs_read_word_data(client, sbs_data[REG_STATUS].addr); - if (ret < 0) - val->intval = 0; /* battery disconnected */ - else - val->intval = 1; /* battery present */ - } else { /* POWER_SUPPLY_PROP_HEALTH */ + /* Dummy command; if it succeeds, battery is present. */ + ret = sbs_read_word_data(client, sbs_data[REG_STATUS].addr); + + if (ret < 0) { /* battery not present*/ + if (psp == POWER_SUPPLY_PROP_PRESENT) { + val->intval = 0; + return 0; + } + return ret; + } + + if (psp == POWER_SUPPLY_PROP_PRESENT) + val->intval = 1; /* battery present */ + else /* POWER_SUPPLY_PROP_HEALTH */ /* SBS spec doesn't have a general health command. */ val->intval = POWER_SUPPLY_HEALTH_UNKNOWN; - }
return 0; } @@ -626,6 +631,8 @@ static int sbs_get_property(struct power else ret = sbs_get_battery_presence_and_health(client, psp, val); + + /* this can only be true if no gpio is used */ if (psp == POWER_SUPPLY_PROP_PRESENT) return 0; break;
From: Tom Zanussi zanussi@kernel.org
commit 17f8607a1658a8e70415eef67909f990d13017b5 upstream.
Original changelog from Steve Rostedt (except last sentence which explains the problem, and the Fixes: tag):
I performed a three way histogram with the following commands:
echo 'irq_lat u64 lat pid_t pid' > synthetic_events echo 'wake_lat u64 lat u64 irqlat pid_t pid' >> synthetic_events echo 'hist:keys=common_pid:irqts=common_timestamp.usecs if function == 0xffffffff81200580' > events/timer/hrtimer_start/trigger echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$irqts:onmatch(timer.hrtimer_start).irq_lat($lat,pid) if common_flags & 1' > events/sched/sched_waking/trigger echo 'hist:keys=pid:wakets=common_timestamp.usecs,irqlat=lat' > events/synthetic/irq_lat/trigger echo 'hist:keys=next_pid:lat=common_timestamp.usecs-$wakets,irqlat=$irqlat:onmatch(synthetic.irq_lat).wake_lat($lat,$irqlat,next_pid)' > events/sched/sched_switch/trigger echo 1 > events/synthetic/wake_lat/enable
Basically I wanted to see:
hrtimer_start (calling function tick_sched_timer)
Note:
# grep tick_sched_timer /proc/kallsyms ffffffff81200580 t tick_sched_timer
And save the time of that, and then record sched_waking if it is called in interrupt context and with the same pid as the hrtimer_start, it will record the latency between that and the waking event.
I then look at when the task that is woken is scheduled in, and record the latency between the wakeup and the task running.
At the end, the wake_lat synthetic event will show the wakeup to scheduled latency, as well as the irq latency in from hritmer_start to the wakeup. The problem is that I found this:
<idle>-0 [007] d... 190.485261: wake_lat: lat=27 irqlat=190485230 pid=698 <idle>-0 [005] d... 190.485283: wake_lat: lat=40 irqlat=190485239 pid=10 <idle>-0 [002] d... 190.488327: wake_lat: lat=56 irqlat=190488266 pid=335 <idle>-0 [005] d... 190.489330: wake_lat: lat=64 irqlat=190489262 pid=10 <idle>-0 [003] d... 190.490312: wake_lat: lat=43 irqlat=190490265 pid=77 <idle>-0 [005] d... 190.493322: wake_lat: lat=54 irqlat=190493262 pid=10 <idle>-0 [005] d... 190.497305: wake_lat: lat=35 irqlat=190497267 pid=10 <idle>-0 [005] d... 190.501319: wake_lat: lat=50 irqlat=190501264 pid=10
The irqlat seemed quite large! Investigating this further, if I had enabled the irq_lat synthetic event, I noticed this:
<idle>-0 [002] d.s. 249.429308: irq_lat: lat=164968 pid=335 <idle>-0 [002] d... 249.429369: wake_lat: lat=55 irqlat=249429308 pid=335
Notice that the timestamp of the irq_lat "249.429308" is awfully similar to the reported irqlat variable. In fact, all instances were like this. It appeared that:
irqlat=$irqlat
Wasn't assigning the old $irqlat to the new irqlat variable, but instead was assigning the $irqts to it.
The issue is that assigning the old $irqlat to the new irqlat variable creates a variable reference alias, but the alias creation code forgets to make sure the alias uses the same var_ref_idx to access the reference.
Link: http://lkml.kernel.org/r/1567375321.5282.12.camel@kernel.org
Cc: Linux Trace Devel linux-trace-devel@vger.kernel.org Cc: linux-rt-users linux-rt-users@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 7e8b88a30b085 ("tracing: Add hist trigger support for variable reference aliases") Reported-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Tom Zanussi zanussi@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace_events_hist.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -2785,6 +2785,8 @@ static struct hist_field *create_alias(s return NULL; }
+ alias->var_ref_idx = var_ref->var_ref_idx; + return alias; }
From: Kees Cook keescook@chromium.org
commit 314eed30ede02fa925990f535652254b5bad6b65 upstream.
When running on a system with >512MB RAM with a 32-bit kernel built with:
CONFIG_DEBUG_VIRTUAL=y CONFIG_HIGHMEM=y CONFIG_HARDENED_USERCOPY=y
all execve()s will fail due to argv copying into kmap()ed pages, and on usercopy checking the calls ultimately of virt_to_page() will be looking for "bad" kmap (highmem) pointers due to CONFIG_DEBUG_VIRTUAL=y:
------------[ cut here ]------------ kernel BUG at ../arch/x86/mm/physaddr.c:83! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc8 #6 Hardware name: Dell Inc. Inspiron 1318/0C236D, BIOS A04 01/15/2009 EIP: __phys_addr+0xaf/0x100 ... Call Trace: __check_object_size+0xaf/0x3c0 ? __might_sleep+0x80/0xa0 copy_strings+0x1c2/0x370 copy_strings_kernel+0x2b/0x40 __do_execve_file+0x4ca/0x810 ? kmem_cache_alloc+0x1c7/0x370 do_execve+0x1b/0x20 ...
The check is from arch/x86/mm/physaddr.c:
VIRTUAL_BUG_ON((phys_addr >> PAGE_SHIFT) > max_low_pfn);
Due to the kmap() in fs/exec.c:
kaddr = kmap(kmapped_page); ... if (copy_from_user(kaddr+offset, str, bytes_to_copy)) ...
Now we can fetch the correct page to avoid the pfn check. In both cases, hardened usercopy will need to walk the page-span checker (if enabled) to do sanity checking.
Reported-by: Randy Dunlap rdunlap@infradead.org Tested-by: Randy Dunlap rdunlap@infradead.org Fixes: f5509cc18daa ("mm: Hardened usercopy") Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Link: https://lore.kernel.org/r/201909171056.7F2FFD17@keescook Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/usercopy.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -11,6 +11,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/mm.h> +#include <linux/highmem.h> #include <linux/slab.h> #include <linux/sched.h> #include <linux/sched/task.h> @@ -227,7 +228,12 @@ static inline void check_heap_object(con if (!virt_addr_valid(ptr)) return;
- page = virt_to_head_page(ptr); + /* + * When CONFIG_HIGHMEM=y, kmap_to_page() will give either the + * highmem page or fallback to virt_to_page(). The following + * is effectively a highmem-aware virt_to_head_page(). + */ + page = compound_head(kmap_to_page((void *)ptr));
if (PageSlab(page)) { /* Check slab allocator for flags and size. */
From: Li RongQing lirongqing@baidu.com
commit e430d802d6a3aaf61bd3ed03d9404888a29b9bf9 upstream.
The timer delayed for more than 3 seconds warning was triggered during testing.
Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xee/0x100 ... Call Trace: process_one_work+0x18c/0x3a0 worker_thread+0x30/0x380 kthread+0x113/0x130 ret_from_fork+0x22/0x40
The reason is that the code in collect_expired_timers() uses jiffies unprotected:
if (next_event > jiffies) base->clk = jiffies;
As the compiler is allowed to reload the value base->clk can advance between the check and the store and in the worst case advance farther than next event. That causes the timer expiry to be delayed until the wheel pointer wraps around.
Convert the code to use READ_ONCE()
Fixes: 236968383cf5 ("timers: Optimize collect_expired_timers() for NOHZ") Signed-off-by: Li RongQing lirongqing@baidu.com Signed-off-by: Liang ZhiCheng liangzhicheng@baidu.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/timer.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1593,24 +1593,26 @@ void timer_clear_idle(void) static int collect_expired_timers(struct timer_base *base, struct hlist_head *heads) { + unsigned long now = READ_ONCE(jiffies); + /* * NOHZ optimization. After a long idle sleep we need to forward the * base to current jiffies. Avoid a loop by searching the bitfield for * the next expiring timer. */ - if ((long)(jiffies - base->clk) > 2) { + if ((long)(now - base->clk) > 2) { unsigned long next = __next_timer_interrupt(base);
/* * If the next timer is ahead of time forward to current * jiffies, otherwise forward to the next expiry time: */ - if (time_after(next, jiffies)) { + if (time_after(next, now)) { /* * The call site will increment base->clk and then * terminate the expiry loop immediately. */ - base->clk = jiffies; + base->clk = now; return 0; } base->clk = next;
From: Jon Derrick jonathan.derrick@intel.com
commit e3dffa4f6c3612dea337c9c59191bd418afc941b upstream.
VMD maps child device config spaces to the VMD Config BAR linearly regardless of the starting bus offset. Because of this, the config address decode must ignore starting bus offsets when mapping the BDF to the config space address.
Fixes: 2a5a9c9a20f9 ("PCI: vmd: Add offset to bus numbers if necessary") Signed-off-by: Jon Derrick jonathan.derrick@intel.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/controller/vmd.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
--- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -94,6 +94,7 @@ struct vmd_dev { struct resource resources[3]; struct irq_domain *irq_domain; struct pci_bus *bus; + u8 busn_start;
struct dma_map_ops dma_ops; struct dma_domain dma_domain; @@ -440,7 +441,8 @@ static char __iomem *vmd_cfg_addr(struct unsigned int devfn, int reg, int len) { char __iomem *addr = vmd->cfgbar + - (bus->number << 20) + (devfn << 12) + reg; + ((bus->number - vmd->busn_start) << 20) + + (devfn << 12) + reg;
if ((addr - vmd->cfgbar) + len >= resource_size(&vmd->dev->resource[VMD_CFGBAR])) @@ -563,7 +565,7 @@ static int vmd_enable_domain(struct vmd_ unsigned long flags; LIST_HEAD(resources); resource_size_t offset[2] = {0}; - resource_size_t membar2_offset = 0x2000, busn_start = 0; + resource_size_t membar2_offset = 0x2000; struct pci_bus *child;
/* @@ -606,14 +608,14 @@ static int vmd_enable_domain(struct vmd_ pci_read_config_dword(vmd->dev, PCI_REG_VMCONFIG, &vmconfig); if (BUS_RESTRICT_CAP(vmcap) && (BUS_RESTRICT_CFG(vmconfig) == 0x1)) - busn_start = 128; + vmd->busn_start = 128; }
res = &vmd->dev->resource[VMD_CFGBAR]; vmd->resources[0] = (struct resource) { .name = "VMD CFGBAR", - .start = busn_start, - .end = busn_start + (resource_size(res) >> 20) - 1, + .start = vmd->busn_start, + .end = vmd->busn_start + (resource_size(res) >> 20) - 1, .flags = IORESOURCE_BUS | IORESOURCE_PCI_FIXED, };
@@ -681,8 +683,8 @@ static int vmd_enable_domain(struct vmd_ pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]); pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);
- vmd->bus = pci_create_root_bus(&vmd->dev->dev, busn_start, &vmd_ops, - sd, &resources); + vmd->bus = pci_create_root_bus(&vmd->dev->dev, vmd->busn_start, + &vmd_ops, sd, &resources); if (!vmd->bus) { pci_free_resource_list(&resources); irq_domain_remove(vmd->irq_domain);
From: Dexuan Cui decui@microsoft.com
commit 533ca1feed98b0bf024779a14760694c7cb4d431 upstream.
The slot must be removed before the pci_dev is removed, otherwise a panic can happen due to use-after-free.
Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver") Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/controller/pci-hyperv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -2701,8 +2701,8 @@ static int hv_pci_remove(struct hv_devic /* Remove the bus from PCI's point of view. */ pci_lock_rescan_remove(); pci_stop_root_bus(hbus->pci_bus); - pci_remove_root_bus(hbus->pci_bus); hv_pci_remove_slots(hbus); + pci_remove_root_bus(hbus->pci_bus); pci_unlock_rescan_remove(); hbus->state = hv_pcibus_removed; }
From: Jon Derrick jonathan.derrick@intel.com
commit a1a30170138c9c5157bd514ccd4d76b47060f29b upstream.
The shadow offset scratchpad was moved to 0x2000-0x2010. Update the location to get the correct shadow offset.
Fixes: 6788958e4f3c ("PCI: vmd: Assign membar addresses from shadow registers") Signed-off-by: Jon Derrick jonathan.derrick@intel.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/controller/vmd.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -31,6 +31,9 @@ #define PCI_REG_VMLOCK 0x70 #define MB2_SHADOW_EN(vmlock) (vmlock & 0x2)
+#define MB2_SHADOW_OFFSET 0x2000 +#define MB2_SHADOW_SIZE 16 + enum vmd_features { /* * Device may contain registers which hint the physical location of the @@ -578,7 +581,7 @@ static int vmd_enable_domain(struct vmd_ u32 vmlock; int ret;
- membar2_offset = 0x2018; + membar2_offset = MB2_SHADOW_OFFSET + MB2_SHADOW_SIZE; ret = pci_read_config_dword(vmd->dev, PCI_REG_VMLOCK, &vmlock); if (ret || vmlock == ~0) return -ENODEV; @@ -590,9 +593,9 @@ static int vmd_enable_domain(struct vmd_ if (!membar2) return -ENOMEM; offset[0] = vmd->dev->resource[VMD_MEMBAR1].start - - readq(membar2 + 0x2008); + readq(membar2 + MB2_SHADOW_OFFSET); offset[1] = vmd->dev->resource[VMD_MEMBAR2].start - - readq(membar2 + 0x2010); + readq(membar2 + MB2_SHADOW_OFFSET + 8); pci_iounmap(vmd->dev, membar2); } }
From: Sumit Saxena sumit.saxena@broadcom.com
commit d2182b2d4b71ff0549a07f414d921525fade707b upstream.
In a Resizable BAR Control Register, bits 13:8 control the size of the BAR. The encoded values of these bits are as follows (see PCIe r5.0, sec 7.8.6.3):
Value BAR size 0 1 MB (2^20 bytes) 1 2 MB (2^21 bytes) 2 4 MB (2^22 bytes) ... 43 8 EB (2^63 bytes)
Previously we incorrectly set the BAR size bits for a 1 MB BAR to 0x1f instead of 0, so devices that support that size, e.g., new megaraid_sas and mpt3sas adapters, fail to initialize during resume from S3 sleep.
Correctly calculate the BAR size bits for Resizable BAR control registers.
Link: https://lore.kernel.org/r/20190725192552.24295-1-sumit.saxena@broadcom.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203939 Fixes: d3252ace0bc6 ("PCI: Restore resized BAR state on resume") Signed-off-by: Sumit Saxena sumit.saxena@broadcom.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Christian König christian.koenig@amd.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1443,7 +1443,7 @@ static void pci_restore_rebar_state(stru pci_read_config_dword(pdev, pos + PCI_REBAR_CTRL, &ctrl); bar_idx = ctrl & PCI_REBAR_CTRL_BAR_IDX; res = pdev->resource + bar_idx; - size = order_base_2((resource_size(res) >> 20) | 1) - 1; + size = ilog2(resource_size(res)) - 20; ctrl &= ~PCI_REBAR_CTRL_BAR_SIZE; ctrl |= size << PCI_REBAR_CTRL_BAR_SHIFT; pci_write_config_dword(pdev, pos + PCI_REBAR_CTRL, ctrl);
From: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com
commit 981c107cbb420ee028f8ecd155352cfd6351c246 upstream.
The Python files required by the selftests are not packaged because of the missing assignment to TEST_FILES. Add the assignment.
Cc: stable@vger.kernel.org Fixes: 6ea3dfe1e073 ("selftests: add TPM 2.0 tests") Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Reviewed-by: Petr Vorel pvorel@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/tpm2/Makefile | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/tpm2/Makefile +++ b/tools/testing/selftests/tpm2/Makefile @@ -2,3 +2,4 @@ include ../lib.mk
TEST_PROGS := test_smoke.sh test_space.sh +TEST_FILES := tpm2.py tpm2_tests.py
From: Shuah Khan skhan@linuxfoundation.org
commit 3969e76909d3aa06715997896184ee684f68d164 upstream.
Fix build failure:
undefined reference to `pthread_create' collect2: error: ld returned 1 exit status
Fix CFLAGS to include pthread correctly.
Fixes: 740378dc7834 ("pidfd: add polling selftests") Signed-off-by: Shuah Khan skhan@linuxfoundation.org Reviewed-by: Christian Brauner christian.brauner@ubuntu.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20190924195237.30519-1-skhan@linuxfoundation.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/pidfd/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/pidfd/Makefile +++ b/tools/testing/selftests/pidfd/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only -CFLAGS += -g -I../../../../usr/include/ -lpthread +CFLAGS += -g -I../../../../usr/include/ -pthread
TEST_GEN_PROGS := pidfd_test pidfd_open_test
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit 144783a80cd2cbc45c6ce17db649140b65f203dd upstream.
Converting from ms to s requires dividing by 1000, not multiplying. So this is currently taking the smaller of new_timeout and 1.28e8, i.e. effectively new_timeout.
The driver knows what it set max_hw_heartbeat_ms to, so use that value instead of doing a division at run-time.
FWIW, this can easily be tested by booting into a busybox shell and doing "watchdog -t 5 -T 130 /dev/watchdog" - without this patch, the watchdog fires after 130&127 == 2 seconds.
Fixes: b07e228eee69 "watchdog: imx2_wdt: Fix set_timeout for big timeout values" Cc: stable@vger.kernel.org # 5.2 plus anything the above got backported to Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20190812131356.23039-1-linux@rasmusvillemoes.dk Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/watchdog/imx2_wdt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/watchdog/imx2_wdt.c +++ b/drivers/watchdog/imx2_wdt.c @@ -55,7 +55,7 @@
#define IMX2_WDT_WMCR 0x08 /* Misc Register */
-#define IMX2_WDT_MAX_TIME 128 +#define IMX2_WDT_MAX_TIME 128U #define IMX2_WDT_DEFAULT_TIME 60 /* in seconds */
#define WDOG_SEC_TO_COUNT(s) ((s * 2 - 1) << 8) @@ -180,7 +180,7 @@ static int imx2_wdt_set_timeout(struct w { unsigned int actual;
- actual = min(new_timeout, wdog->max_hw_heartbeat_ms * 1000); + actual = min(new_timeout, IMX2_WDT_MAX_TIME); __imx2_wdt_set_timeout(wdog, actual); wdog->timeout = new_timeout; return 0;
From: Srikar Dronamraju srikar@linux.vnet.ibm.com
commit 443f2d5ba13d65ccfd879460f77941875159d154 upstream.
Observe a segmentation fault when 'perf stat' is asked to repeat forever with the interval option.
Without fix:
# perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.000211692 3,13,89,82,34,157 cycles 10.000380119 1,53,98,52,22,294 cycles 10.040467280 17,16,79,265 cycles Segmentation fault
This problem was only observed when we use forever option aka -r 0 and works with limited repeats. Calling print_counter with ts being set to NULL, is not a correct option when interval is set. Hence avoid print_counter(NULL,..) if interval is set.
With fix:
# perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.019866622 3,15,14,43,08,697 cycles 10.039865756 3,15,16,31,95,261 cycles 10.059950628 1,26,05,47,158 cycles 5.009902655 3,14,52,62,33,932 cycles 10.019880228 3,14,52,22,89,154 cycles 10.030543876 66,90,18,333 cycles 5.009848281 3,14,51,98,25,437 cycles 10.029854402 3,15,14,93,04,918 cycles 5.009834177 3,14,51,95,92,316 cycles
Committer notes:
Did the 'git bisect' to find the cset introducing the problem to add the Fixes tag below, and at that time the problem reproduced as:
(gdb) run stat -r0 -I500 sleep 1 <SNIP> Program received signal SIGSEGV, Segmentation fault. print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 866 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep); (gdb) bt #0 print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 #1 0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938 #2 0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411 #3 0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370 #4 0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429 #5 0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473 #6 0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588 (gdb)
Mostly the same as just before this patch:
Program received signal SIGSEGV, Segmentation fault. 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 964 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep); (gdb) bt #0 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 #1 0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670) at util/stat-display.c:1172 #2 0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656 #3 0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960 #4 0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310 #5 0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362 #6 0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406 #7 0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531 (gdb)
Fixes: d4f63a4741a8 ("perf stat: Introduce print_counters function") Signed-off-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Acked-by: Jiri Olsa jolsa@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Tested-by: Ravi Bangoria ravi.bangoria@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org # v4.2+ Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/builtin-stat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1962,7 +1962,7 @@ int cmd_stat(int argc, const char **argv run_idx + 1);
status = run_perf_stat(argc, argv, run_idx); - if (forever && status != -1) { + if (forever && status != -1 && !interval) { print_counters(NULL, argc, argv); perf_stat__reset_stats(); }
From: Maarten Lankhorst maarten.lankhorst@linux.intel.com
commit cffb4c3ea37248c4fc2f4ce747e5c24af88aec76 upstream.
There was a integer wraparound when mode_clock became too high, and we didn't correct for the FEC overhead factor when dividing, with the calculations breaking at HBR3.
As a result our calculated bpp was way too high, and the link width limitation never came into effect.
Print out the resulting bpp calcululations as a sanity check, just in case we ever have to debug it later on again.
We also used the wrong factor for FEC. While bspec mentions 2.4%, all the calculations use 1/0.972261, and the same ratio should be applied to data M/N as well, so use it there when FEC is enabled.
This fixes the FIFO underrun we are seeing with FEC enabled.
Changes since v2: - Handle fec_enable in intel_link_compute_m_n, so only data M/N is adjusted. (Ville) - Fix initial hardware readout for FEC. (Ville) Changes since v3: - Remove bogus fec_to_mode_clock. (Ville) Changes since v4: - Use the correct register for icl. (Ville) - Split hw readout to a separate patch.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Fixes: d9218c8f6cf4 ("drm/i915/dp: Add helpers for Compressed BPP and Slice Count for DSC") Cc: stable@vger.kernel.org # v5.0+ Cc: Manasi Navare manasi.d.navare@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190925082110.17439-1-maarten... Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com (cherry picked from commit ed06efb801bd291e935238d3fba46fa03d098f0e) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/display/intel_display.c | 12 + drivers/gpu/drm/i915/display/intel_display.h | 2 drivers/gpu/drm/i915/display/intel_dp.c | 184 +++++++++++++-------------- drivers/gpu/drm/i915/display/intel_dp.h | 6 drivers/gpu/drm/i915/display/intel_dp_mst.c | 2 5 files changed, 107 insertions(+), 99 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -7132,7 +7132,7 @@ retry: pipe_config->fdi_lanes = lane;
intel_link_compute_m_n(pipe_config->pipe_bpp, lane, fdi_dotclock, - link_bw, &pipe_config->fdi_m_n, false); + link_bw, &pipe_config->fdi_m_n, false, false);
ret = ironlake_check_fdi_lanes(dev, intel_crtc->pipe, pipe_config); if (ret == -EDEADLK) @@ -7379,11 +7379,15 @@ void intel_link_compute_m_n(u16 bits_per_pixel, int nlanes, int pixel_clock, int link_clock, struct intel_link_m_n *m_n, - bool constant_n) + bool constant_n, bool fec_enable) { - m_n->tu = 64; + u32 data_clock = bits_per_pixel * pixel_clock; + + if (fec_enable) + data_clock = intel_dp_mode_to_fec_clock(data_clock);
- compute_m_n(bits_per_pixel * pixel_clock, + m_n->tu = 64; + compute_m_n(data_clock, link_clock * nlanes * 8, &m_n->gmch_m, &m_n->gmch_n, constant_n); --- a/drivers/gpu/drm/i915/display/intel_display.h +++ b/drivers/gpu/drm/i915/display/intel_display.h @@ -351,7 +351,7 @@ struct intel_link_m_n { void intel_link_compute_m_n(u16 bpp, int nlanes, int pixel_clock, int link_clock, struct intel_link_m_n *m_n, - bool constant_n); + bool constant_n, bool fec_enable); bool is_ccs_modifier(u64 modifier); void lpt_disable_clkout_dp(struct drm_i915_private *dev_priv); u32 intel_plane_fb_max_stride(struct drm_i915_private *dev_priv, --- a/drivers/gpu/drm/i915/display/intel_dp.c +++ b/drivers/gpu/drm/i915/display/intel_dp.c @@ -76,8 +76,8 @@ #define DP_DSC_MAX_ENC_THROUGHPUT_0 340000 #define DP_DSC_MAX_ENC_THROUGHPUT_1 400000
-/* DP DSC FEC Overhead factor = (100 - 2.4)/100 */ -#define DP_DSC_FEC_OVERHEAD_FACTOR 976 +/* DP DSC FEC Overhead factor = 1/(0.972261) */ +#define DP_DSC_FEC_OVERHEAD_FACTOR 972261
/* Compliance test status bits */ #define INTEL_DP_RESOLUTION_SHIFT_MASK 0 @@ -526,6 +526,97 @@ int intel_dp_get_link_train_fallback_val return 0; }
+u32 intel_dp_mode_to_fec_clock(u32 mode_clock) +{ + return div_u64(mul_u32_u32(mode_clock, 1000000U), + DP_DSC_FEC_OVERHEAD_FACTOR); +} + +static u16 intel_dp_dsc_get_output_bpp(u32 link_clock, u32 lane_count, + u32 mode_clock, u32 mode_hdisplay) +{ + u32 bits_per_pixel, max_bpp_small_joiner_ram; + int i; + + /* + * Available Link Bandwidth(Kbits/sec) = (NumberOfLanes)* + * (LinkSymbolClock)* 8 * (TimeSlotsPerMTP) + * for SST -> TimeSlotsPerMTP is 1, + * for MST -> TimeSlotsPerMTP has to be calculated + */ + bits_per_pixel = (link_clock * lane_count * 8) / + intel_dp_mode_to_fec_clock(mode_clock); + DRM_DEBUG_KMS("Max link bpp: %u\n", bits_per_pixel); + + /* Small Joiner Check: output bpp <= joiner RAM (bits) / Horiz. width */ + max_bpp_small_joiner_ram = DP_DSC_MAX_SMALL_JOINER_RAM_BUFFER / mode_hdisplay; + DRM_DEBUG_KMS("Max small joiner bpp: %u\n", max_bpp_small_joiner_ram); + + /* + * Greatest allowed DSC BPP = MIN (output BPP from available Link BW + * check, output bpp from small joiner RAM check) + */ + bits_per_pixel = min(bits_per_pixel, max_bpp_small_joiner_ram); + + /* Error out if the max bpp is less than smallest allowed valid bpp */ + if (bits_per_pixel < valid_dsc_bpp[0]) { + DRM_DEBUG_KMS("Unsupported BPP %u, min %u\n", + bits_per_pixel, valid_dsc_bpp[0]); + return 0; + } + + /* Find the nearest match in the array of known BPPs from VESA */ + for (i = 0; i < ARRAY_SIZE(valid_dsc_bpp) - 1; i++) { + if (bits_per_pixel < valid_dsc_bpp[i + 1]) + break; + } + bits_per_pixel = valid_dsc_bpp[i]; + + /* + * Compressed BPP in U6.4 format so multiply by 16, for Gen 11, + * fractional part is 0 + */ + return bits_per_pixel << 4; +} + +static u8 intel_dp_dsc_get_slice_count(struct intel_dp *intel_dp, + int mode_clock, int mode_hdisplay) +{ + u8 min_slice_count, i; + int max_slice_width; + + if (mode_clock <= DP_DSC_PEAK_PIXEL_RATE) + min_slice_count = DIV_ROUND_UP(mode_clock, + DP_DSC_MAX_ENC_THROUGHPUT_0); + else + min_slice_count = DIV_ROUND_UP(mode_clock, + DP_DSC_MAX_ENC_THROUGHPUT_1); + + max_slice_width = drm_dp_dsc_sink_max_slice_width(intel_dp->dsc_dpcd); + if (max_slice_width < DP_DSC_MIN_SLICE_WIDTH_VALUE) { + DRM_DEBUG_KMS("Unsupported slice width %d by DP DSC Sink device\n", + max_slice_width); + return 0; + } + /* Also take into account max slice width */ + min_slice_count = min_t(u8, min_slice_count, + DIV_ROUND_UP(mode_hdisplay, + max_slice_width)); + + /* Find the closest match to the valid slice count values */ + for (i = 0; i < ARRAY_SIZE(valid_dsc_slicecount); i++) { + if (valid_dsc_slicecount[i] > + drm_dp_dsc_sink_max_slice_count(intel_dp->dsc_dpcd, + false)) + break; + if (min_slice_count <= valid_dsc_slicecount[i]) + return valid_dsc_slicecount[i]; + } + + DRM_DEBUG_KMS("Unsupported Slice Count %d\n", min_slice_count); + return 0; +} + static enum drm_mode_status intel_dp_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) @@ -2248,7 +2339,7 @@ intel_dp_compute_config(struct intel_enc adjusted_mode->crtc_clock, pipe_config->port_clock, &pipe_config->dp_m_n, - constant_n); + constant_n, pipe_config->fec_enable);
if (intel_connector->panel.downclock_mode != NULL && dev_priv->drrs.type == SEAMLESS_DRRS_SUPPORT) { @@ -2258,7 +2349,7 @@ intel_dp_compute_config(struct intel_enc intel_connector->panel.downclock_mode->clock, pipe_config->port_clock, &pipe_config->dp_m2_n2, - constant_n); + constant_n, pipe_config->fec_enable); }
if (!HAS_DDI(dev_priv)) @@ -4345,91 +4436,6 @@ intel_dp_get_sink_irq_esi(struct intel_d DP_DPRX_ESI_LEN; }
-u16 intel_dp_dsc_get_output_bpp(int link_clock, u8 lane_count, - int mode_clock, int mode_hdisplay) -{ - u16 bits_per_pixel, max_bpp_small_joiner_ram; - int i; - - /* - * Available Link Bandwidth(Kbits/sec) = (NumberOfLanes)* - * (LinkSymbolClock)* 8 * ((100-FECOverhead)/100)*(TimeSlotsPerMTP) - * FECOverhead = 2.4%, for SST -> TimeSlotsPerMTP is 1, - * for MST -> TimeSlotsPerMTP has to be calculated - */ - bits_per_pixel = (link_clock * lane_count * 8 * - DP_DSC_FEC_OVERHEAD_FACTOR) / - mode_clock; - - /* Small Joiner Check: output bpp <= joiner RAM (bits) / Horiz. width */ - max_bpp_small_joiner_ram = DP_DSC_MAX_SMALL_JOINER_RAM_BUFFER / - mode_hdisplay; - - /* - * Greatest allowed DSC BPP = MIN (output BPP from avaialble Link BW - * check, output bpp from small joiner RAM check) - */ - bits_per_pixel = min(bits_per_pixel, max_bpp_small_joiner_ram); - - /* Error out if the max bpp is less than smallest allowed valid bpp */ - if (bits_per_pixel < valid_dsc_bpp[0]) { - DRM_DEBUG_KMS("Unsupported BPP %d\n", bits_per_pixel); - return 0; - } - - /* Find the nearest match in the array of known BPPs from VESA */ - for (i = 0; i < ARRAY_SIZE(valid_dsc_bpp) - 1; i++) { - if (bits_per_pixel < valid_dsc_bpp[i + 1]) - break; - } - bits_per_pixel = valid_dsc_bpp[i]; - - /* - * Compressed BPP in U6.4 format so multiply by 16, for Gen 11, - * fractional part is 0 - */ - return bits_per_pixel << 4; -} - -u8 intel_dp_dsc_get_slice_count(struct intel_dp *intel_dp, - int mode_clock, - int mode_hdisplay) -{ - u8 min_slice_count, i; - int max_slice_width; - - if (mode_clock <= DP_DSC_PEAK_PIXEL_RATE) - min_slice_count = DIV_ROUND_UP(mode_clock, - DP_DSC_MAX_ENC_THROUGHPUT_0); - else - min_slice_count = DIV_ROUND_UP(mode_clock, - DP_DSC_MAX_ENC_THROUGHPUT_1); - - max_slice_width = drm_dp_dsc_sink_max_slice_width(intel_dp->dsc_dpcd); - if (max_slice_width < DP_DSC_MIN_SLICE_WIDTH_VALUE) { - DRM_DEBUG_KMS("Unsupported slice width %d by DP DSC Sink device\n", - max_slice_width); - return 0; - } - /* Also take into account max slice width */ - min_slice_count = min_t(u8, min_slice_count, - DIV_ROUND_UP(mode_hdisplay, - max_slice_width)); - - /* Find the closest match to the valid slice count values */ - for (i = 0; i < ARRAY_SIZE(valid_dsc_slicecount); i++) { - if (valid_dsc_slicecount[i] > - drm_dp_dsc_sink_max_slice_count(intel_dp->dsc_dpcd, - false)) - break; - if (min_slice_count <= valid_dsc_slicecount[i]) - return valid_dsc_slicecount[i]; - } - - DRM_DEBUG_KMS("Unsupported Slice Count %d\n", min_slice_count); - return 0; -} - static void intel_pixel_encoding_setup_vsc(struct intel_dp *intel_dp, const struct intel_crtc_state *crtc_state) --- a/drivers/gpu/drm/i915/display/intel_dp.h +++ b/drivers/gpu/drm/i915/display/intel_dp.h @@ -102,10 +102,6 @@ bool intel_dp_source_supports_hbr2(struc bool intel_dp_source_supports_hbr3(struct intel_dp *intel_dp); bool intel_dp_get_link_status(struct intel_dp *intel_dp, u8 *link_status); -u16 intel_dp_dsc_get_output_bpp(int link_clock, u8 lane_count, - int mode_clock, int mode_hdisplay); -u8 intel_dp_dsc_get_slice_count(struct intel_dp *intel_dp, int mode_clock, - int mode_hdisplay);
bool intel_dp_read_dpcd(struct intel_dp *intel_dp); bool intel_dp_get_colorimetry_status(struct intel_dp *intel_dp); @@ -120,4 +116,6 @@ static inline unsigned int intel_dp_unus return ~((1 << lane_count) - 1) & 0xf; }
+u32 intel_dp_mode_to_fec_clock(u32 mode_clock); + #endif /* __INTEL_DP_H__ */ --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c @@ -81,7 +81,7 @@ static int intel_dp_mst_compute_link_con adjusted_mode->crtc_clock, crtc_state->port_clock, &crtc_state->dp_m_n, - constant_n); + constant_n, crtc_state->fec_enable); crtc_state->dp_m_n.tu = slots;
return 0;
From: Daniel Vetter daniel.vetter@ffwll.ch
commit f2cbda2dba11de868759cae9c0d2bab5b8411406 upstream.
It's never been wired up. Only userspace that tried to use it (and didn't actually check whether anything works, but hey it builds) is the -modesetting atomic implementation. And we just shut that up.
If there's anyone else then we need to silently accept this flag no matter what, and find a new one. Because once a flag is tainted, it's lost.
Reviewed-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Reviewed-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Michel Dänzer michel@daenzer.net Cc: Alex Deucher alexdeucher@gmail.com Cc: Adam Jackson ajax@redhat.com Cc: Sean Paul sean@poorly.run Cc: David Airlie airlied@linux.ie Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190903190642.32588-2-daniel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/drm_atomic_uapi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/gpu/drm/drm_atomic_uapi.c +++ b/drivers/gpu/drm/drm_atomic_uapi.c @@ -1301,8 +1301,7 @@ int drm_mode_atomic_ioctl(struct drm_dev if (arg->reserved) return -EINVAL;
- if ((arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) && - !dev->mode_config.async_page_flip) + if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) return -EINVAL;
/* can't test and expect an event at the same time. */
From: Daniel Vetter daniel.vetter@ffwll.ch
commit 26b1d3b527e7bf3e24b814d617866ac5199ce68d upstream.
The -modesetting ddx has a totally broken idea of how atomic works: - doesn't disable old connectors, assuming they get auto-disable like with the legacy setcrtc - assumes ASYNC_FLIP is wired through for the atomic ioctl - not a single call to TEST_ONLY
Iow the implementation is a 1:1 translation of legacy ioctls to atomic, which is a) broken b) pointless.
We already have bugs in both i915 and amdgpu-DC where this prevents us from enabling neat features.
If anyone ever cares about atomic in X we can easily add a new atomic level (req->value == 2) for X to get back the shiny toys.
Since these broken versions of -modesetting have been shipping, there's really no other way to get out of this bind.
v2: - add an informational dmesg output (Rob, Ajax) - reorder after the DRIVER_ATOMIC check to avoid useless noise (Ilia) - allow req->value > 2 so that X can do another attempt at atomic in the future
v3: Go with paranoid, insist that the X should be first (suggested by Rob)
Cc: Ilia Mirkin imirkin@alum.mit.edu Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Reviewed-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com (v1) Reviewed-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com (v1) Cc: Michel Dänzer michel@daenzer.net Cc: Alex Deucher alexdeucher@gmail.com Cc: Adam Jackson ajax@redhat.com Acked-by: Adam Jackson ajax@redhat.com Cc: Sean Paul sean@poorly.run Cc: David Airlie airlied@linux.ie Cc: Rob Clark robdclark@gmail.com Acked-by: Rob Clark robdclark@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190905185318.31363-1-daniel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/drm_ioctl.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -336,7 +336,12 @@ drm_setclientcap(struct drm_device *dev, case DRM_CLIENT_CAP_ATOMIC: if (!drm_core_check_feature(dev, DRIVER_ATOMIC)) return -EOPNOTSUPP; - if (req->value > 1) + /* The modesetting DDX has a totally broken idea of atomic. */ + if (current->comm[0] == 'X' && req->value == 1) { + pr_info("broken atomic modeset userspace detected, disabling atomic\n"); + return -EOPNOTSUPP; + } + if (req->value > 2) return -EINVAL; file_priv->atomic = req->value; file_priv->universal_planes = req->value;
From: Anders Roxell anders.roxell@linaro.org
commit 28ba1b1da49a20ba8fb767d6ddd7c521ec79a119 upstream.
Now that -Wimplicit-fallthrough is passed to GCC by default, the following warnings shows up:
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_format_get_bpp’: ../drivers/gpu/drm/arm/malidp_hw.c:387:8: warning: this statement may fall through [-Wimplicit-fallthrough=] bpp = 30; ~~~~^~~~ ../drivers/gpu/drm/arm/malidp_hw.c:388:3: note: here case DRM_FORMAT_YUV420_10BIT: ^~~~ ../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_se_irq’: ../drivers/gpu/drm/arm/malidp_hw.c:1311:4: warning: this statement may fall through [-Wimplicit-fallthrough=] drm_writeback_signal_completion(&malidp->mw_connector, 0); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/arm/malidp_hw.c:1313:3: note: here case MW_START: ^~~~
Rework to add a 'break;' in a case that didn't have it so that the compiler doesn't warn about fall-through.
Cc: stable@vger.kernel.org # v5.2+ Fixes: b8207562abdd ("drm/arm/malidp: Specified the rotation memory requirements for AFBC YUV formats") Acked-by: Liviu Dudau liviu.dudau@arm.com Signed-off-by: Anders Roxell anders.roxell@linaro.org Signed-off-by: Liviu Dudau Liviu.Dudau@arm.com Link: https://patchwork.freedesktop.org/patch/msgid/20190730153056.3606-1-anders.r... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/arm/malidp_hw.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/arm/malidp_hw.c +++ b/drivers/gpu/drm/arm/malidp_hw.c @@ -385,6 +385,7 @@ int malidp_format_get_bpp(u32 fmt) switch (fmt) { case DRM_FORMAT_VUY101010: bpp = 30; + break; case DRM_FORMAT_YUV420_10BIT: bpp = 15; break; @@ -1309,7 +1310,7 @@ static irqreturn_t malidp_se_irq(int irq break; case MW_RESTART: drm_writeback_signal_completion(&malidp->mw_connector, 0); - /* fall through to a new start */ + /* fall through - to a new start */ case MW_START: /* writeback started, need to emulate one-shot mode */ hw->disable_memwrite(hwdev);
From: Tomi Valkeinen tomi.valkeinen@ti.com
commit e2c4ed148cf3ec8669a1d90dc66966028e5fad70 upstream.
The OMAP36xx and AM/DM37x TRMs say that the maximum divider for DSS fclk (in CM_CLKSEL_DSS) is 32. Experimentation shows that this is not correct, and using divider of 32 breaks DSS with a flood or underflows and sync losts. Dividers up to 31 seem to work fine.
There is another patch to the DT files to limit the divider correctly, but as the DSS driver also needs to know the maximum divider to be able to iteratively find good rates, we also need to do the fix in the DSS driver.
Signed-off-by: Tomi Valkeinen tomi.valkeinen@ti.com Cc: Adam Ford aford173@gmail.com Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20191002122542.8449-1-tomi.val... Tested-by: Adam Ford aford173@gmail.com Reviewed-by: Jyri Sarha jsarha@ti.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/omapdrm/dss/dss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/omapdrm/dss/dss.c +++ b/drivers/gpu/drm/omapdrm/dss/dss.c @@ -1090,7 +1090,7 @@ static const struct dss_features omap34x
static const struct dss_features omap3630_dss_feats = { .model = DSS_MODEL_OMAP3, - .fck_div_max = 32, + .fck_div_max = 31, .fck_freq_max = 173000000, .dss_fck_multiplier = 1, .parent_clk_name = "dpll4_ck",
From: Sean Paul seanpaul@chromium.org
commit 5fb9b797d5ccf311ae4aba69e86080d47668b5f7 upstream.
clk_get_parent returns an error pointer upon failure, not NULL. So the checks as they exist won't catch a failure. This patch changes the checks and the return values to properly handle an error pointer.
Fixes: c4d8cfe516dc ("drm/msm/dsi: add implementation for helper functions") Cc: Sibi Sankar sibis@codeaurora.org Cc: Sean Paul seanpaul@chromium.org Cc: Rob Clark robdclark@chromium.org Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Sean Paul seanpaul@chromium.org Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/msm/dsi/dsi_host.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c +++ b/drivers/gpu/drm/msm/dsi/dsi_host.c @@ -421,15 +421,15 @@ static int dsi_clk_init(struct msm_dsi_h }
msm_host->byte_clk_src = clk_get_parent(msm_host->byte_clk); - if (!msm_host->byte_clk_src) { - ret = -ENODEV; + if (IS_ERR(msm_host->byte_clk_src)) { + ret = PTR_ERR(msm_host->byte_clk_src); pr_err("%s: can't find byte_clk clock. ret=%d\n", __func__, ret); goto exit; }
msm_host->pixel_clk_src = clk_get_parent(msm_host->pixel_clk); - if (!msm_host->pixel_clk_src) { - ret = -ENODEV; + if (IS_ERR(msm_host->pixel_clk_src)) { + ret = PTR_ERR(msm_host->pixel_clk_src); pr_err("%s: can't find pixel_clk clock. ret=%d\n", __func__, ret); goto exit; }
From: Lyude Paul lyude@redhat.com
commit 698c1aa9f83b618de79e9e5e19a58f70a4a6ae0f upstream.
On the ThinkPad P71, we have one eDP connector exposed along with 5 DP connectors, resulting in a total of 11 TMDS encoders. Since the GPU on this system is also capable of MST, we create an additional 4 fake MST encoders for each DP port. Unfortunately, we also do this for the eDP port as well, resulting in:
1 eDP port: +1 TMDS encoder +4 DPMST encoders 5 DP ports: +2 TMDS encoders +4 DPMST encoders *5 ports == 35 encoders
Which breaks things, since DRM has a hard coded limit of 32 encoders. So, fix this by not creating MSTMs for any eDP connectors. This brings us down to 31 encoders, although we can do better.
This fixes driver probing for nouveau on the ThinkPad P71.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs bskeggs@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/nouveau/dispnv50/disp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -1603,7 +1603,8 @@ nv50_sor_create(struct drm_connector *co nv_encoder->aux = aux; }
- if ((data = nvbios_dp_table(bios, &ver, &hdr, &cnt, &len)) && + if (nv_connector->type != DCB_CONNECTOR_eDP && + (data = nvbios_dp_table(bios, &ver, &hdr, &cnt, &len)) && ver >= 0x40 && (nvbios_rd08(bios, data + 0x08) & 0x04)) { ret = nv50_mstm_new(nv_encoder, &nv_connector->aux, 16, nv_connector->base.base.id,
From: Kevin Wang kevin1.wang@amd.com
commit e0e4a2ce7a059d051c66cd7c94314fef3cd91aea upstream.
v2: change period from 10ms to 100ms (typo error)
too high frequence to update mertrics table will cause smu firmware error,so change mertrics table update period from 1ms to 100ms (navi10, 12, 14)
Signed-off-by: Kevin Wang kevin1.wang@amd.com Reviewed-by: Kenneth Feng kenneth.feng@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 5.3.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c +++ b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c @@ -532,7 +532,7 @@ static int navi10_get_metrics_table(stru struct smu_table_context *smu_table= &smu->smu_table; int ret = 0;
- if (!smu_table->metrics_time || time_after(jiffies, smu_table->metrics_time + HZ / 1000)) { + if (!smu_table->metrics_time || time_after(jiffies, smu_table->metrics_time + msecs_to_jiffies(100))) { ret = smu_update_table(smu, SMU_TABLE_SMU_METRICS, 0, (void *)smu_table->metrics_table, false); if (ret) {
From: Xiaolin Zhang xiaolin.zhang@intel.com
commit 0a3242bdb47713e09cb004a0ba4947d3edf82d8a upstream.
when creating a vGPU workload, the guest context head pointer should be updated correctly by comparing with the exsiting workload in the guest worklod queue including the current running context.
in some situation, there is a running context A and then received 2 new vGPU workload context B and A. in the new workload context A, it's head pointer should be updated with the running context A's tail.
v2: walk through guest workload list in backward way.
Cc: stable@vger.kernel.org Signed-off-by: Xiaolin Zhang xiaolin.zhang@intel.com Reviewed-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gvt/scheduler.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
--- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -1424,9 +1424,6 @@ static int prepare_mm(struct intel_vgpu_ #define same_context(a, b) (((a)->context_id == (b)->context_id) && \ ((a)->lrca == (b)->lrca))
-#define get_last_workload(q) \ - (list_empty(q) ? NULL : container_of(q->prev, \ - struct intel_vgpu_workload, list)) /** * intel_vgpu_create_workload - create a vGPU workload * @vgpu: a vGPU @@ -1446,7 +1443,7 @@ intel_vgpu_create_workload(struct intel_ { struct intel_vgpu_submission *s = &vgpu->submission; struct list_head *q = workload_q_head(vgpu, ring_id); - struct intel_vgpu_workload *last_workload = get_last_workload(q); + struct intel_vgpu_workload *last_workload = NULL; struct intel_vgpu_workload *workload = NULL; struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; u64 ring_context_gpa; @@ -1472,15 +1469,20 @@ intel_vgpu_create_workload(struct intel_ head &= RB_HEAD_OFF_MASK; tail &= RB_TAIL_OFF_MASK;
- if (last_workload && same_context(&last_workload->ctx_desc, desc)) { - gvt_dbg_el("ring id %d cur workload == last\n", ring_id); - gvt_dbg_el("ctx head %x real head %lx\n", head, - last_workload->rb_tail); - /* - * cannot use guest context head pointer here, - * as it might not be updated at this time - */ - head = last_workload->rb_tail; + list_for_each_entry_reverse(last_workload, q, list) { + + if (same_context(&last_workload->ctx_desc, desc)) { + gvt_dbg_el("ring id %d cur workload == last\n", + ring_id); + gvt_dbg_el("ctx head %x real head %lx\n", head, + last_workload->rb_tail); + /* + * cannot use guest context head pointer here, + * as it might not be updated at this time + */ + head = last_workload->rb_tail; + break; + } }
gvt_dbg_el("ring id %d begin a new workload\n", ring_id);
From: Chris Wilson chris@chris-wilson.co.uk
commit cb6d7c7dc7ff8cace666ddec66334117a6068ce2 upstream.
set_page_dirty says:
For pages with a mapping this should be done under the page lock for the benefit of asynchronous memory errors who prefer a consistent dirty state. This rule can be broken in some special cases, but should be better not to.
Under those rules, it is only safe for us to use the plain set_page_dirty calls for shmemfs/anonymous memory. Userptr may be used with real mappings and so needs to use the locked version (set_page_dirty_lock).
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203317 Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190708140327.26825-1-chris@c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -664,7 +664,15 @@ i915_gem_userptr_put_pages(struct drm_i9
for_each_sgt_page(page, sgt_iter, pages) { if (obj->mm.dirty) - set_page_dirty(page); + /* + * As this may not be anonymous memory (e.g. shmem) + * but exist on a real mapping, we have to lock + * the page in order to dirty it -- holding + * the page reference is not sufficient to + * prevent the inode from being truncated. + * Play safe and take the lock. + */ + set_page_dirty_lock(page);
mark_page_accessed(page); put_page(page);
From: Xiaolin Zhang xiaolin.zhang@intel.com
commit 9e77f5001b9833a6bdd3940df245053c2212a32b upstream.
vgpu ppgtt notification was split into 2 steps, the first step is to update PVINFO's pdp register and then write PVINFO's g2v_notify register with action code to tirgger ppgtt notification to GVT side.
currently these steps were not atomic operations due to no any protection, so it is easy to enter race condition state during the MTBF, stress and IGT test to cause GPU hang.
the solution is to add a lock to make vgpu ppgtt notication as atomic operation.
Cc: stable@vger.kernel.org Signed-off-by: Xiaolin Zhang xiaolin.zhang@intel.com Acked-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/1566543451-13955-1-git-send-em... (cherry picked from commit 52988009843160c5b366b4082ed6df48041c655c) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem_gtt.c | 12 +++++++----- drivers/gpu/drm/i915/i915_vgpu.c | 1 + 3 files changed, 9 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1073,6 +1073,7 @@ struct i915_frontbuffer_tracking { };
struct i915_virtual_gpu { + struct mutex lock; /* serialises sending of g2v_notify command pkts */ bool active; u32 caps; }; --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1248,14 +1248,15 @@ free_scratch_page: return ret; }
-static int gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create) +static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create) { - struct i915_address_space *vm = &ppgtt->vm; - struct drm_i915_private *dev_priv = vm->i915; + struct drm_i915_private *dev_priv = ppgtt->vm.i915; enum vgt_g2v_type msg; int i;
- if (i915_vm_is_4lvl(vm)) { + mutex_lock(&dev_priv->vgpu.lock); + + if (i915_vm_is_4lvl(&ppgtt->vm)) { const u64 daddr = px_dma(ppgtt->pd);
I915_WRITE(vgtif_reg(pdp[0].lo), lower_32_bits(daddr)); @@ -1275,9 +1276,10 @@ static int gen8_ppgtt_notify_vgt(struct VGT_G2V_PPGTT_L3_PAGE_TABLE_DESTROY); }
+ /* g2v_notify atomically (via hv trap) consumes the message packet. */ I915_WRITE(vgtif_reg(g2v_notify), msg);
- return 0; + mutex_unlock(&dev_priv->vgpu.lock); }
static void gen8_free_scratch(struct i915_address_space *vm) --- a/drivers/gpu/drm/i915/i915_vgpu.c +++ b/drivers/gpu/drm/i915/i915_vgpu.c @@ -79,6 +79,7 @@ void i915_check_vgpu(struct drm_i915_pri dev_priv->vgpu.caps = __raw_uncore_read32(uncore, vgtif_reg(vgt_caps));
dev_priv->vgpu.active = true; + mutex_init(&dev_priv->vgpu.lock); DRM_INFO("Virtual GPU for Intel GVT-g detected.\n"); }
From: Johannes Berg johannes.berg@intel.com
commit d8dec42b5c2d2b273bc30b0e073cfbe832d69902 upstream.
Drivers typically expect this, as it's the case for almost all cases where this is called (i.e. from the TX path). Also, the code in mac80211 itself (if the driver calls ieee80211_tx_dequeue()) expects this as it uses this_cpu_ptr() without additional protection.
This should fix various reports of the problem: https://bugzilla.kernel.org/show_bug.cgi?id=204127 https://lore.kernel.org/linux-wireless/CAN5HydrWb3o_FE6A1XDnP1E+xS66d5kiEuhH... https://lore.kernel.org/lkml/nycvar.YFH.7.76.1909111238470.473@cbobk.fhfr.pm...
Cc: stable@vger.kernel.org Reported-and-tested-by: Jiri Kosina jkosina@suse.cz Reported-by: Aaron Hill aa1ronham@gmail.com Reported-by: Lukas Redlinger rel+kernel@agilox.net Reported-by: Oleksii Shevchuk alxchk@gmail.com Fixes: 21a5d4c3a45c ("mac80211: add stop/start logic for software TXQs") Link: https://lore.kernel.org/r/1569928763-I3e8838c5ecad878e59d4a94eb069a90f664146... Reviewed-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/mac80211/util.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -247,7 +247,8 @@ static void __ieee80211_wake_txqs(struct struct sta_info *sta; int i;
- spin_lock_bh(&fq->lock); + local_bh_disable(); + spin_lock(&fq->lock);
if (sdata->vif.type == NL80211_IFTYPE_AP) ps = &sdata->bss->ps; @@ -273,9 +274,9 @@ static void __ieee80211_wake_txqs(struct &txqi->flags)) continue;
- spin_unlock_bh(&fq->lock); + spin_unlock(&fq->lock); drv_wake_tx_queue(local, txqi); - spin_lock_bh(&fq->lock); + spin_lock(&fq->lock); } }
@@ -288,12 +289,14 @@ static void __ieee80211_wake_txqs(struct (ps && atomic_read(&ps->num_sta_ps)) || ac != vif->txq->ac) goto out;
- spin_unlock_bh(&fq->lock); + spin_unlock(&fq->lock);
drv_wake_tx_queue(local, txqi); + local_bh_enable(); return; out: - spin_unlock_bh(&fq->lock); + spin_unlock(&fq->lock); + local_bh_enable(); }
static void
From: Nicolin Chen nicoleotsuka@gmail.com
commit b960bc448a252428bacca271f3416a8bda3b599b upstream.
The SDHCI controller on Tegra186 supports 40-bit addressing, which is usually enough to address all of system memory. However, if the SDHCI controller is behind an IOMMU, the address space can go beyond. This happens on Tegra186 and later where the ARM SMMU has an input address space of 48 bits. If the DMA API is backed by this ARM SMMU, the top- down IOVA allocator will cause IOV addresses to be returned that the SDHCI controller cannot access.
Unfortunately, prior to the introduction of the ->set_dma_mask() host operation, the SDHCI core would set either a 64-bit DMA mask if the controller claimed to support 64-bit addressing, or a 32-bit DMA mask otherwise.
Since the full 64 bits cannot be addressed on Tegra, this had to be worked around in commit 68481a7e1c84 ("mmc: tegra: Mark 64 bit dma broken on Tegra186") by setting the SDHCI_QUIRK2_BROKEN_64_BIT_DMA quirk, which effectively restricts the DMA mask to 32 bits.
One disadvantage of this is that dma_map_*() APIs will now try to use the swiotlb to bounce DMA to addresses beyond of the controller's DMA mask. This in turn caused degraded performance and can lead to situations where the swiotlb buffer is exhausted, which in turn leads to DMA transfers to fail.
With the recent introduction of the ->set_dma_mask() host operation, this can now be properly fixed. For each generation of Tegra, the exact supported DMA mask can be configured. This kills two birds with one stone: it avoids the use of bounce buffers because system memory never exceeds the addressable memory range of the SDHCI controllers on these devices, and at the same time when an IOMMU is involved, it prevents IOV addresses from being allocated beyond the addressible range of the controllers.
Since the DMA mask is now properly handled, the 64-bit DMA quirk can be removed.
Signed-off-by: Nicolin Chen nicoleotsuka@gmail.com [treding@nvidia.com: provide more background in commit message] Tested-by: Nicolin Chen nicoleotsuka@gmail.com Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Thierry Reding treding@nvidia.com Cc: stable@vger.kernel.org # v4.15 + Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-tegra.c | 48 +++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 20 deletions(-)
--- a/drivers/mmc/host/sdhci-tegra.c +++ b/drivers/mmc/host/sdhci-tegra.c @@ -4,6 +4,7 @@ */
#include <linux/delay.h> +#include <linux/dma-mapping.h> #include <linux/err.h> #include <linux/module.h> #include <linux/init.h> @@ -104,6 +105,7 @@
struct sdhci_tegra_soc_data { const struct sdhci_pltfm_data *pdata; + u64 dma_mask; u32 nvquirks; u8 min_tap_delay; u8 max_tap_delay; @@ -1233,11 +1235,25 @@ static const struct cqhci_host_ops sdhci .update_dcmd_desc = sdhci_tegra_update_dcmd_desc, };
+static int tegra_sdhci_set_dma_mask(struct sdhci_host *host) +{ + struct sdhci_pltfm_host *platform = sdhci_priv(host); + struct sdhci_tegra *tegra = sdhci_pltfm_priv(platform); + const struct sdhci_tegra_soc_data *soc = tegra->soc_data; + struct device *dev = mmc_dev(host->mmc); + + if (soc->dma_mask) + return dma_set_mask_and_coherent(dev, soc->dma_mask); + + return 0; +} + static const struct sdhci_ops tegra_sdhci_ops = { .get_ro = tegra_sdhci_get_ro, .read_w = tegra_sdhci_readw, .write_l = tegra_sdhci_writel, .set_clock = tegra_sdhci_set_clock, + .set_dma_mask = tegra_sdhci_set_dma_mask, .set_bus_width = sdhci_set_bus_width, .reset = tegra_sdhci_reset, .platform_execute_tuning = tegra_sdhci_execute_tuning, @@ -1257,6 +1273,7 @@ static const struct sdhci_pltfm_data sdh
static const struct sdhci_tegra_soc_data soc_data_tegra20 = { .pdata = &sdhci_tegra20_pdata, + .dma_mask = DMA_BIT_MASK(32), .nvquirks = NVQUIRK_FORCE_SDHCI_SPEC_200 | NVQUIRK_ENABLE_BLOCK_GAP_DET, }; @@ -1283,6 +1300,7 @@ static const struct sdhci_pltfm_data sdh
static const struct sdhci_tegra_soc_data soc_data_tegra30 = { .pdata = &sdhci_tegra30_pdata, + .dma_mask = DMA_BIT_MASK(32), .nvquirks = NVQUIRK_ENABLE_SDHCI_SPEC_300 | NVQUIRK_ENABLE_SDR50 | NVQUIRK_ENABLE_SDR104 | @@ -1295,6 +1313,7 @@ static const struct sdhci_ops tegra114_s .write_w = tegra_sdhci_writew, .write_l = tegra_sdhci_writel, .set_clock = tegra_sdhci_set_clock, + .set_dma_mask = tegra_sdhci_set_dma_mask, .set_bus_width = sdhci_set_bus_width, .reset = tegra_sdhci_reset, .platform_execute_tuning = tegra_sdhci_execute_tuning, @@ -1316,6 +1335,7 @@ static const struct sdhci_pltfm_data sdh
static const struct sdhci_tegra_soc_data soc_data_tegra114 = { .pdata = &sdhci_tegra114_pdata, + .dma_mask = DMA_BIT_MASK(32), };
static const struct sdhci_pltfm_data sdhci_tegra124_pdata = { @@ -1325,22 +1345,13 @@ static const struct sdhci_pltfm_data sdh SDHCI_QUIRK_NO_HISPD_BIT | SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC | SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN, - .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN | - /* - * The TRM states that the SD/MMC controller found on - * Tegra124 can address 34 bits (the maximum supported by - * the Tegra memory controller), but tests show that DMA - * to or from above 4 GiB doesn't work. This is possibly - * caused by missing programming, though it's not obvious - * what sequence is required. Mark 64-bit DMA broken for - * now to fix this for existing users (e.g. Nyan boards). - */ - SDHCI_QUIRK2_BROKEN_64_BIT_DMA, + .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN, .ops = &tegra114_sdhci_ops, };
static const struct sdhci_tegra_soc_data soc_data_tegra124 = { .pdata = &sdhci_tegra124_pdata, + .dma_mask = DMA_BIT_MASK(34), };
static const struct sdhci_ops tegra210_sdhci_ops = { @@ -1349,6 +1360,7 @@ static const struct sdhci_ops tegra210_s .write_w = tegra210_sdhci_writew, .write_l = tegra_sdhci_writel, .set_clock = tegra_sdhci_set_clock, + .set_dma_mask = tegra_sdhci_set_dma_mask, .set_bus_width = sdhci_set_bus_width, .reset = tegra_sdhci_reset, .set_uhs_signaling = tegra_sdhci_set_uhs_signaling, @@ -1369,6 +1381,7 @@ static const struct sdhci_pltfm_data sdh
static const struct sdhci_tegra_soc_data soc_data_tegra210 = { .pdata = &sdhci_tegra210_pdata, + .dma_mask = DMA_BIT_MASK(34), .nvquirks = NVQUIRK_NEEDS_PAD_CONTROL | NVQUIRK_HAS_PADCALIB | NVQUIRK_DIS_CARD_CLK_CONFIG_TAP | @@ -1383,6 +1396,7 @@ static const struct sdhci_ops tegra186_s .read_w = tegra_sdhci_readw, .write_l = tegra_sdhci_writel, .set_clock = tegra_sdhci_set_clock, + .set_dma_mask = tegra_sdhci_set_dma_mask, .set_bus_width = sdhci_set_bus_width, .reset = tegra_sdhci_reset, .set_uhs_signaling = tegra_sdhci_set_uhs_signaling, @@ -1398,20 +1412,13 @@ static const struct sdhci_pltfm_data sdh SDHCI_QUIRK_NO_HISPD_BIT | SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC | SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN, - .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN | - /* SDHCI controllers on Tegra186 support 40-bit addressing. - * IOVA addresses are 48-bit wide on Tegra186. - * With 64-bit dma mask used for SDHCI, accesses can - * be broken. Disable 64-bit dma, which would fall back - * to 32-bit dma mask. Ideally 40-bit dma mask would work, - * But it is not supported as of now. - */ - SDHCI_QUIRK2_BROKEN_64_BIT_DMA, + .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN, .ops = &tegra186_sdhci_ops, };
static const struct sdhci_tegra_soc_data soc_data_tegra186 = { .pdata = &sdhci_tegra186_pdata, + .dma_mask = DMA_BIT_MASK(40), .nvquirks = NVQUIRK_NEEDS_PAD_CONTROL | NVQUIRK_HAS_PADCALIB | NVQUIRK_DIS_CARD_CLK_CONFIG_TAP | @@ -1424,6 +1431,7 @@ static const struct sdhci_tegra_soc_data
static const struct sdhci_tegra_soc_data soc_data_tegra194 = { .pdata = &sdhci_tegra186_pdata, + .dma_mask = DMA_BIT_MASK(39), .nvquirks = NVQUIRK_NEEDS_PAD_CONTROL | NVQUIRK_HAS_PADCALIB | NVQUIRK_DIS_CARD_CLK_CONFIG_TAP |
From: Russell King rmk+kernel@armlinux.org.uk
commit d1c536e3177390da43d99f20143b810c35433d1f upstream.
ADMA errors are potentially data corrupting events; although we print the register state, we do not usefully print the ADMA descriptors. Worse than that, we print them by referencing their virtual address which is meaningless when the register state gives us the DMA address of the failing descriptor.
Print the ADMA descriptors giving their DMA addresses rather than their virtual addresses, and print them using SDHCI_DUMP() rather than DBG().
We also do not show the correct value of the interrupt status register; the register dump shows the current value, after we have cleared the pending interrupts we are going to service. What is more useful is to print the interrupts that _were_ pending at the time the ADMA error was encountered. Fix that too.
Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -2857,6 +2857,7 @@ static void sdhci_cmd_irq(struct sdhci_h static void sdhci_adma_show_error(struct sdhci_host *host) { void *desc = host->adma_table; + dma_addr_t dma = host->adma_addr;
sdhci_dumpregs(host);
@@ -2864,18 +2865,21 @@ static void sdhci_adma_show_error(struct struct sdhci_adma2_64_desc *dma_desc = desc;
if (host->flags & SDHCI_USE_64_BIT_DMA) - DBG("%p: DMA 0x%08x%08x, LEN 0x%04x, Attr=0x%02x\n", - desc, le32_to_cpu(dma_desc->addr_hi), + SDHCI_DUMP("%08llx: DMA 0x%08x%08x, LEN 0x%04x, Attr=0x%02x\n", + (unsigned long long)dma, + le32_to_cpu(dma_desc->addr_hi), le32_to_cpu(dma_desc->addr_lo), le16_to_cpu(dma_desc->len), le16_to_cpu(dma_desc->cmd)); else - DBG("%p: DMA 0x%08x, LEN 0x%04x, Attr=0x%02x\n", - desc, le32_to_cpu(dma_desc->addr_lo), + SDHCI_DUMP("%08llx: DMA 0x%08x, LEN 0x%04x, Attr=0x%02x\n", + (unsigned long long)dma, + le32_to_cpu(dma_desc->addr_lo), le16_to_cpu(dma_desc->len), le16_to_cpu(dma_desc->cmd));
desc += host->desc_sz; + dma += host->desc_sz;
if (dma_desc->cmd & cpu_to_le16(ADMA2_END)) break; @@ -2951,7 +2955,8 @@ static void sdhci_data_irq(struct sdhci_ != MMC_BUS_TEST_R) host->data->error = -EILSEQ; else if (intmask & SDHCI_INT_ADMA_ERROR) { - pr_err("%s: ADMA error\n", mmc_hostname(host->mmc)); + pr_err("%s: ADMA error: 0x%08x\n", mmc_hostname(host->mmc), + intmask); sdhci_adma_show_error(host); host->data->error = -EIO; if (host->ops->adma_workaround)
From: Russell King rmk+kernel@armlinux.org.uk
commit 121bd08b029e03404c451bb237729cdff76eafed upstream.
We must not unconditionally set the DMA snoop bit; if the DMA API is assuming that the device is not DMA coherent, and the device snoops the CPU caches, the device can see stale cache lines brought in by speculative prefetch.
This leads to the device seeing stale data, potentially resulting in corrupted data transfers. Commonly, this results in a descriptor fetch error such as:
mmc0: ADMA error mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00002202 mmc0: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000001 mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013 mmc0: sdhci: Present: 0x01f50008 | Host ctl: 0x00000038 mmc0: sdhci: Power: 0x00000003 | Blk gap: 0x00000000 mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x000040d8 mmc0: sdhci: Timeout: 0x00000003 | Int stat: 0x00000001 mmc0: sdhci: Int enab: 0x037f108f | Sig enab: 0x037f108b mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202 mmc0: sdhci: Caps: 0x35fa0000 | Caps_1: 0x0000af00 mmc0: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000 mmc0: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x001d8a33 mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x3f400e00 mmc0: sdhci: Host ctl2: 0x00000000 mmc0: sdhci: ADMA Err: 0x00000009 | ADMA Ptr: 0x000000236d43820c mmc0: sdhci: ============================================ mmc0: error -5 whilst initialising SD card
but can lead to other errors, and potentially direct the SDHCI controller to read/write data to other memory locations (e.g. if a valid descriptor is visible to the device in a stale cache line.)
Fix this by ensuring that the DMA snoop bit corresponds with the behaviour of the DMA API. Since the driver currently only supports DT, use of_dma_is_coherent(). Note that device_get_dma_attr() can not be used as that risks re-introducing this bug if/when the driver is converted to ACPI.
Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-of-esdhc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/mmc/host/sdhci-of-esdhc.c +++ b/drivers/mmc/host/sdhci-of-esdhc.c @@ -495,7 +495,12 @@ static int esdhc_of_enable_dma(struct sd dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
value = sdhci_readl(host, ESDHC_DMA_SYSCTL); - value |= ESDHC_DMA_SNOOP; + + if (of_dma_is_coherent(dev->of_node)) + value |= ESDHC_DMA_SNOOP; + else + value &= ~ESDHC_DMA_SNOOP; + sdhci_writel(host, value, ESDHC_DMA_SYSCTL); return 0; }
Hi Greg,
On 5th October, Christian Zigotzky chzigotzky@xenosoft.de reported a problem with this on PowerPC (at a guess, it looks like there's a PowerPC user of this where the DT does not mark the device as dma-coherent, but the hardware requires it to be DMA coherent.)
However, despite sending a reply to him within minutes of his email arriving, I've heard nothing since, so there's been no progress on working out what is really going on.
Given that the reporter hasn't responded to my reply, I'm not sure what we should be doing with this... maybe the reporter has solved his problem, maybe he was using an incorrect DT, we just don't know.
On Thu, Oct 10, 2019 at 10:35:37AM +0200, Greg Kroah-Hartman wrote:
From: Russell King rmk+kernel@armlinux.org.uk
commit 121bd08b029e03404c451bb237729cdff76eafed upstream.
We must not unconditionally set the DMA snoop bit; if the DMA API is assuming that the device is not DMA coherent, and the device snoops the CPU caches, the device can see stale cache lines brought in by speculative prefetch.
This leads to the device seeing stale data, potentially resulting in corrupted data transfers. Commonly, this results in a descriptor fetch error such as:
mmc0: ADMA error mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00002202 mmc0: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000001 mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013 mmc0: sdhci: Present: 0x01f50008 | Host ctl: 0x00000038 mmc0: sdhci: Power: 0x00000003 | Blk gap: 0x00000000 mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x000040d8 mmc0: sdhci: Timeout: 0x00000003 | Int stat: 0x00000001 mmc0: sdhci: Int enab: 0x037f108f | Sig enab: 0x037f108b mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202 mmc0: sdhci: Caps: 0x35fa0000 | Caps_1: 0x0000af00 mmc0: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000 mmc0: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x001d8a33 mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x3f400e00 mmc0: sdhci: Host ctl2: 0x00000000 mmc0: sdhci: ADMA Err: 0x00000009 | ADMA Ptr: 0x000000236d43820c mmc0: sdhci: ============================================ mmc0: error -5 whilst initialising SD card
but can lead to other errors, and potentially direct the SDHCI controller to read/write data to other memory locations (e.g. if a valid descriptor is visible to the device in a stale cache line.)
Fix this by ensuring that the DMA snoop bit corresponds with the behaviour of the DMA API. Since the driver currently only supports DT, use of_dma_is_coherent(). Note that device_get_dma_attr() can not be used as that risks re-introducing this bug if/when the driver is converted to ACPI.
Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/mmc/host/sdhci-of-esdhc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/mmc/host/sdhci-of-esdhc.c +++ b/drivers/mmc/host/sdhci-of-esdhc.c @@ -495,7 +495,12 @@ static int esdhc_of_enable_dma(struct sd dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40)); value = sdhci_readl(host, ESDHC_DMA_SYSCTL);
- value |= ESDHC_DMA_SNOOP;
- if (of_dma_is_coherent(dev->of_node))
value |= ESDHC_DMA_SNOOP;
- else
value &= ~ESDHC_DMA_SNOOP;
- sdhci_writel(host, value, ESDHC_DMA_SYSCTL); return 0;
}
On Thu, Oct 10, 2019 at 09:49:12AM +0100, Russell King - ARM Linux admin wrote:
Hi Greg,
On 5th October, Christian Zigotzky chzigotzky@xenosoft.de reported a problem with this on PowerPC (at a guess, it looks like there's a PowerPC user of this where the DT does not mark the device as dma-coherent, but the hardware requires it to be DMA coherent.)
However, despite sending a reply to him within minutes of his email arriving, I've heard nothing since, so there's been no progress on working out what is really going on.
Given that the reporter hasn't responded to my reply, I'm not sure what we should be doing with this... maybe the reporter has solved his problem, maybe he was using an incorrect DT, we just don't know.
Let's just leave this in, and if this did cause a problem, whatever fix is made will be sent to Linus and we can then take that fix into stable as well.
thanks,
greg k-h
From: Adrian Hunter adrian.hunter@intel.com
commit 4ee7dde4c777f14cb0f98dd201491bf6cc15899b upstream.
Add host operation ->set_dma_mask() so that drivers can define their own DMA masks.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Tested-by: Nicolin Chen nicoleotsuka@gmail.com Signed-off-by: Thierry Reding treding@nvidia.com Cc: stable@vger.kernel.org # v4.15 + Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci.c | 12 ++++-------- drivers/mmc/host/sdhci.h | 1 + 2 files changed, 5 insertions(+), 8 deletions(-)
--- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -3763,18 +3763,14 @@ int sdhci_setup_host(struct sdhci_host * host->flags &= ~SDHCI_USE_ADMA; }
- /* - * It is assumed that a 64-bit capable device has set a 64-bit DMA mask - * and *must* do 64-bit DMA. A driver has the opportunity to change - * that during the first call to ->enable_dma(). Similarly - * SDHCI_QUIRK2_BROKEN_64_BIT_DMA must be left to the drivers to - * implement. - */ if (sdhci_can_64bit_dma(host)) host->flags |= SDHCI_USE_64_BIT_DMA;
if (host->flags & (SDHCI_USE_SDMA | SDHCI_USE_ADMA)) { - ret = sdhci_set_dma_mask(host); + if (host->ops->set_dma_mask) + ret = host->ops->set_dma_mask(host); + else + ret = sdhci_set_dma_mask(host);
if (!ret && host->ops->enable_dma) ret = host->ops->enable_dma(host); --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -622,6 +622,7 @@ struct sdhci_ops {
u32 (*irq)(struct sdhci_host *host, u32 intmask);
+ int (*set_dma_mask)(struct sdhci_host *host); int (*enable_dma)(struct sdhci_host *host); unsigned int (*get_max_clock)(struct sdhci_host *host); unsigned int (*get_min_clock)(struct sdhci_host *host);
From: Wanpeng Li wanpengli@tencent.com
commit 89340d0935c9296c7b8222b6eab30e67cb57ab82 upstream.
This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted). A large performance regression was caused by this commit. on over-subscription scenarios.
The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads, with three VMs of 80 vCPUs each. The score of ebizzy -M is reduced from 13000-14000 records/s to 1700-1800 records/s:
Host Guest score
vanilla w/o kvm optimizations upstream 1700-1800 records/s vanilla w/o kvm optimizations revert 13000-14000 records/s vanilla w/ kvm optimizations upstream 4500-5000 records/s vanilla w/ kvm optimizations revert 14000-15500 records/s
Exit from aggressive wait-early mechanism can result in premature yield and extra scheduling latency.
Actually, only 6% of wait_early events are caused by vcpu_is_preempted() being true. However, when one vCPU voluntarily releases its vCPU, all the subsequently waiters in the queue will do the same and the cascading effect leads to bad performance.
kvm optimizations: [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts) [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
Tested-by: loobinliu@tencent.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: Waiman Long longman@redhat.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krčmář rkrcmar@redhat.com Cc: loobinliu@tencent.com Cc: stable@vger.kernel.org Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted) Signed-off-by: Wanpeng Li wanpengli@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/locking/qspinlock_paravirt.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int if ((loop & PV_PREV_CHECK_MASK) != 0) return false;
- return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu); + return READ_ONCE(prev->state) != vcpu_running; }
/*
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit cf387d9644d8c78721cf9b77af9f67bb5b04da16 upstream.
With PFN_MODE_PMEM namespace, the memmap area is allocated from the device area. Some architectures map the memmap area with large page size. On architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. This maps a namespace size of 16G.
When populating memmap region with 16MB page from the device area, make sure the allocated space is not used to map resources outside this namespace. Such usage of device area will prevent a namespace destroy.
Add resource end pnf in altmap and use that to check if the memmap area allocation can map pfn outside the namespace. On ppc64 in such case we fallback to allocation from memory.
This fix kernel crash reported below:
[ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0 [ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000 [ 133.464760] Faulting instruction address: 0xc00000000007580c [ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1] [ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ..... [ 133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0 [ 133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0 [ 133.464910] Call Trace: [ 133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable) [ 133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240 [ 133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590 [ 133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160 [ 133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0 [ 133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50 [ 133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400 [ 133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250 [ 133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110 [ 133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70 [ 133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0 [ 133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250 [ 133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70 [ 133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250
djbw: Aneesh notes that this crash can likely be triggered in any kernel that supports 'papr_scm', so flagging that commit for -stable consideration.
Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: stable@vger.kernel.org Reported-by: Sachin Sant sachinp@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Reviewed-by: Pankaj Gupta pagupta@redhat.com Tested-by: Santosh Sivaraj santosh@fossix.org Reviewed-by: Johannes Thumshirn jthumshirn@suse.de Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/mm/init_64.c | 17 ++++++++++++++++- drivers/nvdimm/pfn_devs.c | 2 ++ include/linux/memremap.h | 1 + 3 files changed, 19 insertions(+), 1 deletion(-)
--- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -172,6 +172,21 @@ static __meminit void vmemmap_list_popul vmemmap_list = vmem_back; }
+static bool altmap_cross_boundary(struct vmem_altmap *altmap, unsigned long start, + unsigned long page_size) +{ + unsigned long nr_pfn = page_size / sizeof(struct page); + unsigned long start_pfn = page_to_pfn((struct page *)start); + + if ((start_pfn + nr_pfn) > altmap->end_pfn) + return true; + + if (start_pfn < altmap->base_pfn) + return true; + + return false; +} + int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) { @@ -194,7 +209,7 @@ int __meminit vmemmap_populate(unsigned * fail due to alignment issues when using 16MB hugepages, so * fall back to system memory if the altmap allocation fail. */ - if (altmap) { + if (altmap && !altmap_cross_boundary(altmap, start, page_size)) { p = altmap_alloc_block_buf(page_size, altmap); if (!p) pr_debug("altmap block allocation failed, falling back to system memory"); --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -618,9 +618,11 @@ static int __nvdimm_setup_pfn(struct nd_ struct nd_namespace_common *ndns = nd_pfn->ndns; struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); resource_size_t base = nsio->res.start + start_pad; + resource_size_t end = nsio->res.end - end_trunc; struct vmem_altmap __altmap = { .base_pfn = init_altmap_base(base), .reserve = init_altmap_reserve(base), + .end_pfn = PHYS_PFN(end), };
memcpy(res, &nsio->res, sizeof(*res)); --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -17,6 +17,7 @@ struct device; */ struct vmem_altmap { const unsigned long base_pfn; + const unsigned long end_pfn; const unsigned long reserve; unsigned long free; unsigned long align;
From: Seth Forshee seth.forshee@canonical.com
commit 61129dd29f7962f278b618a2a3e8fdb986a66dc8 upstream.
The addition of struct clone_args to uapi/linux/sched.h is not protected by __ASSEMBLY__ guards, causing a failure to build from source for glibc on RISC-V. Add the guards to fix this.
Fixes: 7f192e3cd316 ("fork: add clone3") Signed-off-by: Seth Forshee seth.forshee@canonical.com Cc: stable@vger.kernel.org Acked-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20190917071853.12385-1-seth.forshee@canonical.com Signed-off-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/uapi/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+)
--- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -33,6 +33,7 @@ #define CLONE_NEWNET 0x40000000 /* New network namespace */ #define CLONE_IO 0x80000000 /* Clone io context */
+#ifndef __ASSEMBLY__ /* * Arguments for the clone3 syscall */ @@ -46,6 +47,7 @@ struct clone_args { __aligned_u64 stack_size; __aligned_u64 tls; }; +#endif
/* * Scheduling policies
From: H. Nikolaus Schaller hns@goldelico.com
commit f1f028ff89cb0d37db299d48e7b2ce19be040d52 upstream.
commit 6953c57ab172 "gpio: of: Handle SPI chipselect legacy bindings"
did introduce logic to centrally handle the legacy spi-cs-high property in combination with cs-gpios. This assumes that the polarity of the CS has to be inverted if spi-cs-high is missing, even and especially if non-legacy GPIO_ACTIVE_HIGH is specified.
The DTS for the GTA04 was orginally introduced under the assumption that there is no need for spi-cs-high if the gpio is defined with proper polarity GPIO_ACTIVE_HIGH.
This was not a problem until gpiolib changed the interpretation of GPIO_ACTIVE_HIGH and missing spi-cs-high.
The effect is that the missing spi-cs-high is now interpreted as CS being low (despite GPIO_ACTIVE_HIGH) which turns off the SPI interface when the panel is to be programmed by the panel driver.
Therefore, we have to add the redundant and legacy spi-cs-high property to properly activate CS.
Cc: stable@vger.kernel.org Signed-off-by: H. Nikolaus Schaller hns@goldelico.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/omap3-gta04.dtsi | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm/boot/dts/omap3-gta04.dtsi +++ b/arch/arm/boot/dts/omap3-gta04.dtsi @@ -120,6 +120,7 @@ spi-max-frequency = <100000>; spi-cpol; spi-cpha; + spi-cs-high;
backlight= <&backlight>; label = "lcd";
From: David Hildenbrand david@redhat.com
commit c5ad81eb029570c5ca5859539b0679f07a776d25 upstream.
We are missing a __SetPageOffline(), which is why we can get !PageOffline() pages onto the balloon list, where alloc_xenballooned_pages() will complain:
page:ffffea0003e7ffc0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0xffffe00001000(reserved) raw: 000ffffe00001000 dead000000000100 dead000000000200 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(!PageOffline(page)) ------------[ cut here ]------------ kernel BUG at include/linux/page-flags.h:744! invalid opcode: 0000 [#1] SMP NOPTI
Reported-by: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Tested-by: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Fixes: 77c4adf6a6df ("xen/balloon: mark inflated pages PG_offline") Cc: stable@vger.kernel.org # v5.1+ Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Juergen Gross jgross@suse.com Cc: Stefano Stabellini sstabellini@kernel.org Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/xen/balloon.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -688,6 +688,7 @@ static void __init balloon_add_region(un /* totalram_pages and totalhigh_pages do not include the boot-time balloon extension, so don't subtract from it. */ + __SetPageOffline(page); __balloon_append(page); }
From: Juergen Gross jgross@suse.com
commit a8fabb38525c51a094607768bac3ba46b3f4a9d5 upstream.
In case a user process using xenbus has open transactions and is killed e.g. via ctrl-C the following cleanup of the allocated resources might result in a deadlock due to trying to end a transaction in the xenbus worker thread:
[ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds. [ 2551.492215] Tainted: P OE 5.0.0-29-generic #5 [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2551.528585] xenbus D 0 37 2 0x80000080 [ 2551.528590] Call Trace: [ 2551.528603] __schedule+0x2c0/0x870 [ 2551.528606] ? _cond_resched+0x19/0x40 [ 2551.528632] schedule+0x2c/0x70 [ 2551.528637] xs_talkv+0x1ec/0x2b0 [ 2551.528642] ? wait_woken+0x80/0x80 [ 2551.528645] xs_single+0x53/0x80 [ 2551.528648] xenbus_transaction_end+0x3b/0x70 [ 2551.528651] xenbus_file_free+0x5a/0x160 [ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220 [ 2551.528657] xenbus_thread+0x7de/0x880 [ 2551.528660] ? wait_woken+0x80/0x80 [ 2551.528665] kthread+0x121/0x140 [ 2551.528667] ? xb_read+0x1d0/0x1d0 [ 2551.528670] ? kthread_park+0x90/0x90 [ 2551.528673] ret_from_fork+0x35/0x40
Fix this by doing the cleanup via a workqueue instead.
Reported-by: James Dingwall james@dingwall.me.uk Fixes: fd8aa9095a95c ("xen: optimize xenbus driver for multiple concurrent xenstore accesses") Cc: stable@vger.kernel.org # 4.11 Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/xen/xenbus/xenbus_dev_frontend.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c @@ -55,6 +55,7 @@ #include <linux/string.h> #include <linux/slab.h> #include <linux/miscdevice.h> +#include <linux/workqueue.h>
#include <xen/xenbus.h> #include <xen/xen.h> @@ -116,6 +117,8 @@ struct xenbus_file_priv { wait_queue_head_t read_waitq;
struct kref kref; + + struct work_struct wq; };
/* Read out any raw xenbus messages queued up. */ @@ -300,14 +303,14 @@ static void watch_fired(struct xenbus_wa mutex_unlock(&adap->dev_data->reply_mutex); }
-static void xenbus_file_free(struct kref *kref) +static void xenbus_worker(struct work_struct *wq) { struct xenbus_file_priv *u; struct xenbus_transaction_holder *trans, *tmp; struct watch_adapter *watch, *tmp_watch; struct read_buffer *rb, *tmp_rb;
- u = container_of(kref, struct xenbus_file_priv, kref); + u = container_of(wq, struct xenbus_file_priv, wq);
/* * No need for locking here because there are no other users, @@ -333,6 +336,18 @@ static void xenbus_file_free(struct kref kfree(u); }
+static void xenbus_file_free(struct kref *kref) +{ + struct xenbus_file_priv *u; + + /* + * We might be called in xenbus_thread(). + * Use workqueue to avoid deadlock. + */ + u = container_of(kref, struct xenbus_file_priv, kref); + schedule_work(&u->wq); +} + static struct xenbus_transaction_holder *xenbus_get_transaction( struct xenbus_file_priv *u, uint32_t tx_id) { @@ -650,6 +665,7 @@ static int xenbus_file_open(struct inode INIT_LIST_HEAD(&u->watches); INIT_LIST_HEAD(&u->read_buffers); init_waitqueue_head(&u->read_waitq); + INIT_WORK(&u->wq, xenbus_worker);
mutex_init(&u->reply_mutex); mutex_init(&u->msgbuffer_mutex);
From: Johan Hovold johan@kernel.org
commit 7fd25e6fc035f4b04b75bca6d7e8daa069603a76 upstream.
The disconnect callback was accessing the hardware-descriptor private data after having having freed it.
Fixes: 7490b008d123 ("ieee802154: add support for atusb transceiver") Cc: stable stable@vger.kernel.org # 4.2 Cc: Alexander Aring alex.aring@gmail.com Reported-by: syzbot+f4509a9138a1472e7e80@syzkaller.appspotmail.com Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ieee802154/atusb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ieee802154/atusb.c +++ b/drivers/net/ieee802154/atusb.c @@ -1137,10 +1137,11 @@ static void atusb_disconnect(struct usb_
ieee802154_unregister_hw(atusb->hw);
+ usb_put_dev(atusb->usb_dev); + ieee802154_free_hw(atusb->hw);
usb_set_intfdata(interface, NULL); - usb_put_dev(atusb->usb_dev);
pr_debug("%s done\n", __func__); }
From: Johannes Berg johannes.berg@intel.com
commit f88eb7c0d002a67ef31aeb7850b42ff69abc46dc upstream.
We currently don't validate the beacon head, i.e. the header, fixed part and elements that are to go in front of the TIM element. This means that the variable elements there can be malformed, e.g. have a length exceeding the buffer size, but most downstream code from this assumes that this has already been checked.
Add the necessary checks to the netlink policy.
Cc: stable@vger.kernel.org Fixes: ed1b6cc7f80f ("cfg80211/nl80211: add beacon settings") Link: https://lore.kernel.org/r/1569009255-I7ac7fbe9436e9d8733439eab8acbbd35e55c74... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/wireless/nl80211.c | 37 +++++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -201,6 +201,38 @@ cfg80211_get_dev_from_info(struct net *n return __cfg80211_rdev_from_attrs(netns, info->attrs); }
+static int validate_beacon_head(const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + const u8 *data = nla_data(attr); + unsigned int len = nla_len(attr); + const struct element *elem; + const struct ieee80211_mgmt *mgmt = (void *)data; + unsigned int fixedlen = offsetof(struct ieee80211_mgmt, + u.beacon.variable); + + if (len < fixedlen) + goto err; + + if (ieee80211_hdrlen(mgmt->frame_control) != + offsetof(struct ieee80211_mgmt, u.beacon)) + goto err; + + data += fixedlen; + len -= fixedlen; + + for_each_element(elem, data, len) { + /* nothing */ + } + + if (for_each_element_completed(elem, data, len)) + return 0; + +err: + NL_SET_ERR_MSG_ATTR(extack, attr, "malformed beacon head"); + return -EINVAL; +} + static int validate_ie_attr(const struct nlattr *attr, struct netlink_ext_ack *extack) { @@ -322,8 +354,9 @@ const struct nla_policy nl80211_policy[N
[NL80211_ATTR_BEACON_INTERVAL] = { .type = NLA_U32 }, [NL80211_ATTR_DTIM_PERIOD] = { .type = NLA_U32 }, - [NL80211_ATTR_BEACON_HEAD] = { .type = NLA_BINARY, - .len = IEEE80211_MAX_DATA_LEN }, + [NL80211_ATTR_BEACON_HEAD] = + NLA_POLICY_VALIDATE_FN(NLA_BINARY, validate_beacon_head, + IEEE80211_MAX_DATA_LEN), [NL80211_ATTR_BEACON_TAIL] = NLA_POLICY_VALIDATE_FN(NLA_BINARY, validate_ie_attr, IEEE80211_MAX_DATA_LEN),
From: Johannes Berg johannes.berg@intel.com
commit 242b0931c1918c56cd1dc5563fd250a3c39b996d upstream.
The code copying the data assumes that the SSID element is before the MBSSID element, but since the data is untrusted from the AP, this cannot be guaranteed.
Validate that this is indeed the case and ignore the MBSSID otherwise, to avoid having to deal with both cases for the copy of data that should be between them.
Cc: stable@vger.kernel.org Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Link: https://lore.kernel.org/r/1569009255-I1673911f5eae02964e21bdc11b2bf58e5e207e... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/wireless/scan.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -1711,7 +1711,12 @@ cfg80211_update_notlisted_nontrans(struc return; new_ie_len -= trans_ssid[1]; mbssid = cfg80211_find_ie(WLAN_EID_MULTIPLE_BSSID, ie, ielen); - if (!mbssid) + /* + * It's not valid to have the MBSSID element before SSID + * ignore if that happens - the code below assumes it is + * after (while copying things inbetween). + */ + if (!mbssid || mbssid < trans_ssid) return; new_ie_len -= mbssid[1]; rcu_read_lock();
From: Johannes Berg johannes.berg@intel.com
commit f43e5210c739fe76a4b0ed851559d6902f20ceb1 upstream.
In a few places we don't properly initialize on-stack chandefs, resulting in EDMG data to be non-zero, which broke things.
Additionally, in a few places we rely on the driver to init the data completely, but perhaps we shouldn't as non-EDMG drivers may not initialize the EDMG data, also initialize it there.
Cc: stable@vger.kernel.org Fixes: 2a38075cd0be ("nl80211: Add support for EDMG channels") Reported-by: Dmitry Osipenko digetx@gmail.com Tested-by: Dmitry Osipenko digetx@gmail.com Link: https://lore.kernel.org/r/1569239475-I2dcce394ecf873376c386a78f31c2ec8b538fa... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/wireless/nl80211.c | 4 +++- net/wireless/reg.c | 2 +- net/wireless/wext-compat.c | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -2597,6 +2597,8 @@ int nl80211_parse_chandef(struct cfg8021
control_freq = nla_get_u32(attrs[NL80211_ATTR_WIPHY_FREQ]);
+ memset(chandef, 0, sizeof(*chandef)); + chandef->chan = ieee80211_get_channel(&rdev->wiphy, control_freq); chandef->width = NL80211_CHAN_WIDTH_20_NOHT; chandef->center_freq1 = control_freq; @@ -3125,7 +3127,7 @@ static int nl80211_send_iface(struct sk_
if (rdev->ops->get_channel) { int ret; - struct cfg80211_chan_def chandef; + struct cfg80211_chan_def chandef = {};
ret = rdev_get_channel(rdev, wdev, &chandef); if (ret == 0) { --- a/net/wireless/reg.c +++ b/net/wireless/reg.c @@ -2108,7 +2108,7 @@ static void reg_call_notifier(struct wip
static bool reg_wdev_chan_valid(struct wiphy *wiphy, struct wireless_dev *wdev) { - struct cfg80211_chan_def chandef; + struct cfg80211_chan_def chandef = {}; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy); enum nl80211_iftype iftype;
--- a/net/wireless/wext-compat.c +++ b/net/wireless/wext-compat.c @@ -797,7 +797,7 @@ static int cfg80211_wext_giwfreq(struct { struct wireless_dev *wdev = dev->ieee80211_ptr; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); - struct cfg80211_chan_def chandef; + struct cfg80211_chan_def chandef = {}; int ret;
switch (wdev->iftype) {
From: Srinivas Kandagatla srinivas.kandagatla@linaro.org
[ Upstream commit 6b8249abb093551ef173d13a25ed0044d5dd33e0 ]
memory returned as part of nvmem_read via qfprom_read should be freed by the consumer once done. Existing code is not doing it so fix it.
Below memory leak detected by kmemleak [<ffffff80088b7658>] kmemleak_alloc+0x50/0x84 [<ffffff80081df120>] __kmalloc+0xe8/0x168 [<ffffff80086db350>] nvmem_cell_read+0x30/0x80 [<ffffff8008632790>] qfprom_read+0x4c/0x7c [<ffffff80086335a4>] calibrate_v1+0x34/0x204 [<ffffff8008632518>] tsens_probe+0x164/0x258 [<ffffff80084e0a1c>] platform_drv_probe+0x80/0xa0 [<ffffff80084de4f4>] really_probe+0x208/0x248 [<ffffff80084de2c4>] driver_probe_device+0x98/0xc0 [<ffffff80084dec54>] __device_attach_driver+0x9c/0xac [<ffffff80084dca74>] bus_for_each_drv+0x60/0x8c [<ffffff80084de634>] __device_attach+0x8c/0x100 [<ffffff80084de6c8>] device_initial_probe+0x20/0x28 [<ffffff80084dcbb8>] bus_probe_device+0x34/0x7c [<ffffff80084deb08>] deferred_probe_work_func+0x6c/0x98 [<ffffff80080c3da8>] process_one_work+0x160/0x2f8
Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Acked-by: Amit Kucheria amit.kucheria@linaro.org Signed-off-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/qcom/tsens-8960.c | 2 ++ drivers/thermal/qcom/tsens-v0_1.c | 12 ++++++++++-- drivers/thermal/qcom/tsens-v1.c | 1 + drivers/thermal/qcom/tsens.h | 1 + 4 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/qcom/tsens-8960.c b/drivers/thermal/qcom/tsens-8960.c index 8d9b721dadb65..e46a4e3f25c42 100644 --- a/drivers/thermal/qcom/tsens-8960.c +++ b/drivers/thermal/qcom/tsens-8960.c @@ -229,6 +229,8 @@ static int calibrate_8960(struct tsens_priv *priv) for (i = 0; i < num_read; i++, s++) s->offset = data[i];
+ kfree(data); + return 0; }
diff --git a/drivers/thermal/qcom/tsens-v0_1.c b/drivers/thermal/qcom/tsens-v0_1.c index 6f26fadf4c279..055647bcee67d 100644 --- a/drivers/thermal/qcom/tsens-v0_1.c +++ b/drivers/thermal/qcom/tsens-v0_1.c @@ -145,8 +145,10 @@ static int calibrate_8916(struct tsens_priv *priv) return PTR_ERR(qfprom_cdata);
qfprom_csel = (u32 *)qfprom_read(priv->dev, "calib_sel"); - if (IS_ERR(qfprom_csel)) + if (IS_ERR(qfprom_csel)) { + kfree(qfprom_cdata); return PTR_ERR(qfprom_csel); + }
mode = (qfprom_csel[0] & MSM8916_CAL_SEL_MASK) >> MSM8916_CAL_SEL_SHIFT; dev_dbg(priv->dev, "calibration mode is %d\n", mode); @@ -181,6 +183,8 @@ static int calibrate_8916(struct tsens_priv *priv) }
compute_intercept_slope(priv, p1, p2, mode); + kfree(qfprom_cdata); + kfree(qfprom_csel);
return 0; } @@ -198,8 +202,10 @@ static int calibrate_8974(struct tsens_priv *priv) return PTR_ERR(calib);
bkp = (u32 *)qfprom_read(priv->dev, "calib_backup"); - if (IS_ERR(bkp)) + if (IS_ERR(bkp)) { + kfree(calib); return PTR_ERR(bkp); + }
calib_redun_sel = bkp[1] & BKP_REDUN_SEL; calib_redun_sel >>= BKP_REDUN_SHIFT; @@ -313,6 +319,8 @@ static int calibrate_8974(struct tsens_priv *priv) }
compute_intercept_slope(priv, p1, p2, mode); + kfree(calib); + kfree(bkp);
return 0; } diff --git a/drivers/thermal/qcom/tsens-v1.c b/drivers/thermal/qcom/tsens-v1.c index 10b595d4f6199..870f502f2cb6c 100644 --- a/drivers/thermal/qcom/tsens-v1.c +++ b/drivers/thermal/qcom/tsens-v1.c @@ -138,6 +138,7 @@ static int calibrate_v1(struct tsens_priv *priv) }
compute_intercept_slope(priv, p1, p2, mode); + kfree(qfprom_cdata);
return 0; } diff --git a/drivers/thermal/qcom/tsens.h b/drivers/thermal/qcom/tsens.h index 2fd94997245bf..b89083b61c383 100644 --- a/drivers/thermal/qcom/tsens.h +++ b/drivers/thermal/qcom/tsens.h @@ -17,6 +17,7 @@
#include <linux/thermal.h> #include <linux/regmap.h> +#include <linux/slab.h>
struct tsens_priv;
From: Sascha Hauer s.hauer@pengutronix.de
[ Upstream commit f5e1040196dbfe14c77ce3dfe3b7b08d2d961e88 ]
integrity_kernel_read() returns the number of bytes read. If this is a short read then this positive value is returned from ima_calc_file_hash_atfm(). Currently this is only indirectly called from ima_calc_file_hash() and this function only tests for the return value being zero or nonzero and also doesn't forward the return value. Nevertheless there's no point in returning a positive value as an error, so translate a short read into -EINVAL.
Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- security/integrity/ima/ima_crypto.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c index d4c7b8e1b083d..7532b062be594 100644 --- a/security/integrity/ima/ima_crypto.c +++ b/security/integrity/ima/ima_crypto.c @@ -268,8 +268,11 @@ static int ima_calc_file_hash_atfm(struct file *file, rbuf_len = min_t(loff_t, i_size - offset, rbuf_size[active]); rc = integrity_kernel_read(file, offset, rbuf[active], rbuf_len); - if (rc != rbuf_len) + if (rc != rbuf_len) { + if (rc >= 0) + rc = -EINVAL; goto out3; + }
if (rbuf[1] && offset) { /* Using two buffers, and it is not the first
From: Sascha Hauer s.hauer@pengutronix.de
[ Upstream commit 4ece3125f21b1d42b84896c5646dbf0e878464e1 ]
integrity_kernel_read() can fail in which case we forward to call ahash_request_free() on a currently running request. We have to wait for its completion before we can free the request.
This was observed by interrupting a "find / -type f -xdev -print0 | xargs -0 cat 1>/dev/null" with ctrl-c on an IMA enabled filesystem.
Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- security/integrity/ima/ima_crypto.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c index 7532b062be594..73044fc6a9521 100644 --- a/security/integrity/ima/ima_crypto.c +++ b/security/integrity/ima/ima_crypto.c @@ -271,6 +271,11 @@ static int ima_calc_file_hash_atfm(struct file *file, if (rc != rbuf_len) { if (rc >= 0) rc = -EINVAL; + /* + * Forward current rc, do not overwrite with return value + * from ahash_wait() + */ + ahash_wait(ahash_rc, &wait); goto out3; }
From: Jia-Ju Bai baijiaju1990@gmail.com
[ Upstream commit e2751463eaa6f9fec8fea80abbdc62dbc487b3c5 ]
In encode_attrs(), there is an if statement on line 1145 to check whether label is NULL: if (label && (attrmask[2] & FATTR4_WORD2_SECURITY_LABEL))
When label is NULL, it is used on lines 1178-1181: *p++ = cpu_to_be32(label->lfs); *p++ = cpu_to_be32(label->pi); *p++ = cpu_to_be32(label->len); p = xdr_encode_opaque_fixed(p, label->label, label->len);
To fix these bugs, label is checked before being used.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index 46a8d636d151e..ab07db0f07cde 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -1174,7 +1174,7 @@ static void encode_attrs(struct xdr_stream *xdr, const struct iattr *iap, } else *p++ = cpu_to_be32(NFS4_SET_TO_SERVER_TIME); } - if (bmval[2] & FATTR4_WORD2_SECURITY_LABEL) { + if (label && (bmval[2] & FATTR4_WORD2_SECURITY_LABEL)) { *p++ = cpu_to_be32(label->lfs); *p++ = cpu_to_be32(label->pi); *p++ = cpu_to_be32(label->len);
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 395790566eec37706dedeb94779045adc3a7581e ]
Commit 48be539dd44a ("xprtrdma: Introduce ->alloc_slot call-out for xprtrdma") added a separate alloc_slot and free_slot to the RPC/RDMA transport. Later, commit 75891f502f5f ("SUNRPC: Support for congestion control when queuing is enabled") modified the generic alloc/free_slot methods, but neglected the methods in xprtrdma.
Found via code review.
Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprtrdma/transport.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 2ec349ed47702..f4763e8a67617 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -571,6 +571,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) return;
out_sleep: + set_bit(XPRT_CONGESTED, &xprt->state); rpc_sleep_on(&xprt->backlog, task, NULL); task->tk_status = -EAGAIN; } @@ -589,7 +590,8 @@ xprt_rdma_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *rqst)
memset(rqst, 0, sizeof(*rqst)); rpcrdma_buffer_put(&r_xprt->rx_buf, rpcr_to_rdmar(rqst)); - rpc_wake_up_next(&xprt->backlog); + if (unlikely(!rpc_wake_up_next(&xprt->backlog))) + clear_bit(XPRT_CONGESTED, &xprt->state); }
static bool rpcrdma_check_regbuf(struct rpcrdma_xprt *r_xprt,
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 98ef77d1aaa7a2f4e1b2a721faa084222021fda7 ]
Eli Dorfman reports that after a series of idle disconnects, an RPC/RDMA transport becomes unusable (rdma_create_qp returns -ENOMEM). Problem was tracked down to increasing Send Queue size after each reconnect.
The rdma_create_qp() API does not promise to leave its @qp_init_attr parameter unaltered. In fact, some drivers do modify one or more of its fields. Thus our calls to rdma_create_qp must use a fresh copy of ib_qp_init_attr each time.
This fix is appropriate for kernels dating back to late 2007, though it will have to be adapted, as the connect code has changed over the years.
Reported-by: Eli Dorfman eli@vastdata.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprtrdma/verbs.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 805b1f35e1caa..2bd9b4de0e325 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -605,10 +605,10 @@ void rpcrdma_ep_destroy(struct rpcrdma_xprt *r_xprt) * Unlike a normal reconnection, a fresh PD and a new set * of MRs and buffers is needed. */ -static int -rpcrdma_ep_recreate_xprt(struct rpcrdma_xprt *r_xprt, - struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) +static int rpcrdma_ep_recreate_xprt(struct rpcrdma_xprt *r_xprt, + struct ib_qp_init_attr *qp_init_attr) { + struct rpcrdma_ia *ia = &r_xprt->rx_ia; int rc, err;
trace_xprtrdma_reinsert(r_xprt); @@ -625,7 +625,7 @@ rpcrdma_ep_recreate_xprt(struct rpcrdma_xprt *r_xprt, }
rc = -ENETUNREACH; - err = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr); + err = rdma_create_qp(ia->ri_id, ia->ri_pd, qp_init_attr); if (err) { pr_err("rpcrdma: rdma_create_qp returned %d\n", err); goto out3; @@ -642,16 +642,16 @@ rpcrdma_ep_recreate_xprt(struct rpcrdma_xprt *r_xprt, return rc; }
-static int -rpcrdma_ep_reconnect(struct rpcrdma_xprt *r_xprt, struct rpcrdma_ep *ep, - struct rpcrdma_ia *ia) +static int rpcrdma_ep_reconnect(struct rpcrdma_xprt *r_xprt, + struct ib_qp_init_attr *qp_init_attr) { + struct rpcrdma_ia *ia = &r_xprt->rx_ia; struct rdma_cm_id *id, *old; int err, rc;
trace_xprtrdma_reconnect(r_xprt);
- rpcrdma_ep_disconnect(ep, ia); + rpcrdma_ep_disconnect(&r_xprt->rx_ep, ia);
rc = -EHOSTUNREACH; id = rpcrdma_create_id(r_xprt, ia); @@ -673,7 +673,7 @@ rpcrdma_ep_reconnect(struct rpcrdma_xprt *r_xprt, struct rpcrdma_ep *ep, goto out_destroy; }
- err = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr); + err = rdma_create_qp(id, ia->ri_pd, qp_init_attr); if (err) goto out_destroy;
@@ -698,25 +698,27 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) struct rpcrdma_xprt *r_xprt = container_of(ia, struct rpcrdma_xprt, rx_ia); struct rpc_xprt *xprt = &r_xprt->rx_xprt; + struct ib_qp_init_attr qp_init_attr; int rc;
retry: + memcpy(&qp_init_attr, &ep->rep_attr, sizeof(qp_init_attr)); switch (ep->rep_connected) { case 0: dprintk("RPC: %s: connecting...\n", __func__); - rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr); + rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &qp_init_attr); if (rc) { rc = -ENETUNREACH; goto out_noupdate; } break; case -ENODEV: - rc = rpcrdma_ep_recreate_xprt(r_xprt, ep, ia); + rc = rpcrdma_ep_recreate_xprt(r_xprt, &qp_init_attr); if (rc) goto out_noupdate; break; default: - rc = rpcrdma_ep_reconnect(r_xprt, ep, ia); + rc = rpcrdma_ep_reconnect(r_xprt, &qp_init_attr); if (rc) goto out; }
From: Lu Shuaibing shuaibinglu@126.com
[ Upstream commit 0ce772fe79b68f83df40f07f28207b292785c677 ]
The p9_tag_alloc() does not initialize the transport error t_err field. The struct p9_req_t *req is allocated and stored in a struct p9_client variable. The field t_err is never initialized before p9_conn_cancel() checks its value.
KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool) reports this bug.
================================================================== BUG: KUMSAN: use of uninitialized memory in p9_conn_cancel+0x2d9/0x3b0 Read of size 4 at addr ffff88805f9b600c by task kworker/1:2/1216
CPU: 1 PID: 1216 Comm: kworker/1:2 Not tainted 5.2.0-rc4+ #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Workqueue: events p9_write_work Call Trace: dump_stack+0x75/0xae __kumsan_report+0x17c/0x3e6 kumsan_report+0xe/0x20 p9_conn_cancel+0x2d9/0x3b0 p9_write_work+0x183/0x4a0 process_one_work+0x4d1/0x8c0 worker_thread+0x6e/0x780 kthread+0x1ca/0x1f0 ret_from_fork+0x35/0x40
Allocated by task 1979: save_stack+0x19/0x80 __kumsan_kmalloc.constprop.3+0xbc/0x120 kmem_cache_alloc+0xa7/0x170 p9_client_prepare_req.part.9+0x3b/0x380 p9_client_rpc+0x15e/0x880 p9_client_create+0x3d0/0xac0 v9fs_session_init+0x192/0xc80 v9fs_mount+0x67/0x470 legacy_get_tree+0x70/0xd0 vfs_get_tree+0x4a/0x1c0 do_mount+0xba9/0xf90 ksys_mount+0xa8/0x120 __x64_sys_mount+0x62/0x70 do_syscall_64+0x6d/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 0: (stack is not available)
The buggy address belongs to the object at ffff88805f9b6008 which belongs to the cache p9_req_t of size 144 The buggy address is located 4 bytes inside of 144-byte region [ffff88805f9b6008, ffff88805f9b6098) The buggy address belongs to the page: page:ffffea00017e6d80 refcount:1 mapcount:0 mapping:ffff888068b63740 index:0xffff88805f9b7d90 compound_mapcount: 0 flags: 0x100000000010200(slab|head) raw: 0100000000010200 ffff888068b66450 ffff888068b66450 ffff888068b63740 raw: ffff88805f9b7d90 0000000000100001 00000001ffffffff 0000000000000000 page dumped because: kumsan: bad access detected ==================================================================
Link: http://lkml.kernel.org/r/20190613070854.10434-1-shuaibinglu@126.com Signed-off-by: Lu Shuaibing shuaibinglu@126.com [dominique.martinet@cea.fr: grouped the added init with the others] Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/client.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/9p/client.c b/net/9p/client.c index 9622f3e469f67..1d48afc7033ca 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -281,6 +281,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
p9pdu_reset(&req->tc); p9pdu_reset(&req->rc); + req->t_err = 0; req->status = REQ_STATUS_ALLOC; init_waitqueue_head(&req->wq); INIT_LIST_HEAD(&req->req_list);
From: Chengguang Xu cgxu519@zoho.com.cn
[ Upstream commit c87a37ebd40b889178664c2c09cc187334146292 ]
Currently on mmap cache policy, we always attach writeback_fid whether mmap type is SHARED or PRIVATE. However, in the use case of kata-container which combines 9p(Guest OS) with overlayfs(Host OS), this behavior will trigger overlayfs' copy-up when excute command inside container.
Link: http://lkml.kernel.org/r/20190820100325.10313-1-cgxu519@zoho.com.cn Signed-off-by: Chengguang Xu cgxu519@zoho.com.cn Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin sashal@kernel.org --- fs/9p/vfs_file.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index 4cc966a31cb37..fe7f0bd2048e4 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -513,6 +513,7 @@ v9fs_mmap_file_mmap(struct file *filp, struct vm_area_struct *vma) v9inode = V9FS_I(inode); mutex_lock(&v9inode->v_mutex); if (!v9inode->writeback_fid && + (vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE)) { /* * clone a fid and add it to writeback_fid @@ -614,6 +615,8 @@ static void v9fs_mmap_vm_close(struct vm_area_struct *vma) (vma->vm_end - vma->vm_start - 1), };
+ if (!(vma->vm_flags & VM_SHARED)) + return;
p9_debug(P9_DEBUG_VFS, "9p VMA close, %p, flushing", vma);
From: Igor Druzhinin igor.druzhinin@citrix.com
[ Upstream commit a4098bc6eed5e31e0391bcc068e61804c98138df ]
If MCFG area is not reserved in E820, Xen by default will defer its usage until Dom0 registers it explicitly after ACPI parser recognizes it as a reserved resource in DSDT. Having it reserved in E820 is not mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2) and firmware is free to keep a hole in E820 in that place. Xen doesn't know what exactly is inside this hole since it lacks full ACPI view of the platform therefore it's potentially harmful to access MCFG region without additional checks as some machines are known to provide inconsistent information on the size of the region.
Now xen_mcfg_late() runs after acpi_init() which is too late as some basic PCI enumeration starts exactly there as well. Trying to register a device prior to MCFG reservation causes multiple problems with PCIe extended capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are no convenient hooks for us to subscribe to so register MCFG areas earlier upon the first invocation of xen_add_device(). It should be safe to do once since all the boot time buses must have their MCFG areas in MCFG table already and we don't support PCI bus hot-plug.
Signed-off-by: Igor Druzhinin igor.druzhinin@citrix.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/xen/pci.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 3eeb9bea76300..224df03ce42e3 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -17,6 +17,8 @@ #include "../pci/pci.h" #ifdef CONFIG_PCI_MMCONFIG #include <asm/pci_x86.h> + +static int xen_mcfg_late(void); #endif
static bool __read_mostly pci_seg_supported = true; @@ -28,7 +30,18 @@ static int xen_add_device(struct device *dev) #ifdef CONFIG_PCI_IOV struct pci_dev *physfn = pci_dev->physfn; #endif - +#ifdef CONFIG_PCI_MMCONFIG + static bool pci_mcfg_reserved = false; + /* + * Reserve MCFG areas in Xen on first invocation due to this being + * potentially called from inside of acpi_init immediately after + * MCFG table has been finally parsed. + */ + if (!pci_mcfg_reserved) { + xen_mcfg_late(); + pci_mcfg_reserved = true; + } +#endif if (pci_seg_supported) { struct { struct physdev_pci_device_add add; @@ -201,7 +214,7 @@ static int __init register_xen_pci_notifier(void) arch_initcall(register_xen_pci_notifier);
#ifdef CONFIG_PCI_MMCONFIG -static int __init xen_mcfg_late(void) +static int xen_mcfg_late(void) { struct pci_mmcfg_region *cfg; int rc; @@ -240,8 +253,4 @@ static int __init xen_mcfg_late(void) } return 0; } -/* - * Needs to be done after acpi_init which are subsys_initcall. - */ -subsys_initcall_sync(xen_mcfg_late); #endif
From: Miklos Szeredi mszeredi@redhat.com
[ Upstream commit f22f812d5ce75a18b56073a7a63862e6ea764070 ]
The size of struct fuse_req was reduced from 392B to 144B on a non-debug config, thus the sanitize_global_limit() helper was setting a larger default limit. This doesn't really reflect reduction in the memory used by requests, since the fields removed from fuse_req were added to fuse_args derived structs; e.g. sizeof(struct fuse_writepages_args) is 248B, thus resulting in slightly more memory being used for writepage requests overalll (due to using 256B slabs).
Make the calculatation ignore the size of fuse_req and use the old 392B value.
Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fuse/inode.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 987877860c019..f3104db3de83a 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -823,9 +823,12 @@ static const struct super_operations fuse_super_operations = {
static void sanitize_global_limit(unsigned *limit) { + /* + * The default maximum number of async requests is calculated to consume + * 1/2^13 of the total memory, assuming 392 bytes per request. + */ if (*limit == 0) - *limit = ((totalram_pages() << PAGE_SHIFT) >> 13) / - sizeof(struct fuse_req); + *limit = ((totalram_pages() << PAGE_SHIFT) >> 13) / 392;
if (*limit >= 1 << 16) *limit = (1 << 16) - 1;
From: Luis Henriques lhenriques@suse.com
[ Upstream commit 750670341a24cb714e624e0fd7da30900ad93752 ]
When filling an inode with info from the MDS, i_blkbits is being initialized using fl_stripe_unit, which contains the stripe unit in bytes. Unfortunately, this doesn't make sense for directories as they have fl_stripe_unit set to '0'. This means that i_blkbits will be set to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 shift exponent 255 is too large for 32-bit type 'int'
Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit is zero.
Signed-off-by: Luis Henriques lhenriques@suse.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/inode.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 18500edefc56f..3b537e7038c7a 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -801,7 +801,12 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
/* update inode */ inode->i_rdev = le32_to_cpu(info->rdev); - inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; + /* directories have fl_stripe_unit set to zero */ + if (le32_to_cpu(info->layout.fl_stripe_unit)) + inode->i_blkbits = + fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; + else + inode->i_blkbits = CEPH_BLOCK_SHIFT;
__ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 606d102327a45a49d293557527802ee7fbfd7af1 ]
It's protected by the s_gen_ttl_lock, so we should fetch under it and ensure that we're using the same generation in both places.
Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/caps.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index ce0f5658720ab..8fd5301128106 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -645,6 +645,7 @@ void ceph_add_cap(struct inode *inode, struct ceph_cap *cap; int mds = session->s_mds; int actual_wanted; + u32 gen;
dout("add_cap %p mds%d cap %llx %s seq %d\n", inode, session->s_mds, cap_id, ceph_cap_string(issued), seq); @@ -656,6 +657,10 @@ void ceph_add_cap(struct inode *inode, if (fmode >= 0) wanted |= ceph_caps_for_mode(fmode);
+ spin_lock(&session->s_gen_ttl_lock); + gen = session->s_cap_gen; + spin_unlock(&session->s_gen_ttl_lock); + cap = __get_cap_for_mds(ci, mds); if (!cap) { cap = *new_cap; @@ -681,7 +686,7 @@ void ceph_add_cap(struct inode *inode, list_move_tail(&cap->session_caps, &session->s_caps); spin_unlock(&session->s_cap_lock);
- if (cap->cap_gen < session->s_cap_gen) + if (cap->cap_gen < gen) cap->issued = cap->implemented = CEPH_CAP_PIN;
/* @@ -775,7 +780,7 @@ void ceph_add_cap(struct inode *inode, cap->seq = seq; cap->issue_seq = seq; cap->mseq = mseq; - cap->cap_gen = session->s_cap_gen; + cap->cap_gen = gen;
if (fmode >= 0) __ceph_get_fmode(ci, fmode);
From: Erqi Chen chenerqi@gmail.com
[ Upstream commit 71a228bc8d65900179e37ac309e678f8c523f133 ]
If client mds session is evicted in CEPH_MDS_SESSION_OPENING state, mds won't send session msg to client, and delayed_work skip CEPH_MDS_SESSION_OPENING state session, the session hang forever.
Allow ceph_con_keepalive to reconnect a session in OPENING to avoid session hang. Also, ensure that we skip sessions in RESTARTING and REJECTED states since those states can't be resurrected by issuing a keepalive.
Link: https://tracker.ceph.com/issues/41551 Signed-off-by: Erqi Chen chenerqi@gmail.com Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/mds_client.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 920e9f048bd8f..b11af7d8e8e93 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -4044,7 +4044,9 @@ static void delayed_work(struct work_struct *work) pr_info("mds%d hung\n", s->s_mds); } } - if (s->s_state < CEPH_MDS_SESSION_OPEN) { + if (s->s_state == CEPH_MDS_SESSION_NEW || + s->s_state == CEPH_MDS_SESSION_RESTARTING || + s->s_state == CEPH_MDS_SESSION_REJECTED) { /* this mds is failed or recovering, just wait */ ceph_put_mds_session(s); continue;
From: Trond Myklebust trondmy@gmail.com
[ Upstream commit 714fbc73888f59321854e7f6c2f224213923bcad ]
Ensure that we set task->tk_rpc_status for all RPC level errors so that the caller can distinguish between those and server reply status errors.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/clnt.c | 6 +++--- net/sunrpc/sched.c | 5 ++++- 2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 7a75f34ad393b..e7fdc400506e8 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1837,7 +1837,7 @@ call_allocate(struct rpc_task *task) return; }
- rpc_exit(task, -ERESTARTSYS); + rpc_call_rpcerror(task, -ERESTARTSYS); }
static int @@ -2561,7 +2561,7 @@ rpc_encode_header(struct rpc_task *task, struct xdr_stream *xdr) return 0; out_fail: trace_rpc_bad_callhdr(task); - rpc_exit(task, error); + rpc_call_rpcerror(task, error); return error; }
@@ -2628,7 +2628,7 @@ rpc_decode_header(struct rpc_task *task, struct xdr_stream *xdr) return -EAGAIN; } out_err: - rpc_exit(task, error); + rpc_call_rpcerror(task, error); return error;
out_unparsable: diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 1f275aba786fc..53934fe73a9db 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -930,8 +930,10 @@ static void __rpc_execute(struct rpc_task *task) /* * Signalled tasks should exit rather than sleep. */ - if (RPC_SIGNALLED(task)) + if (RPC_SIGNALLED(task)) { + task->tk_rpc_status = -ERESTARTSYS; rpc_exit(task, -ERESTARTSYS); + }
/* * The queue->lock protects against races with @@ -967,6 +969,7 @@ static void __rpc_execute(struct rpc_task *task) */ dprintk("RPC: %5u got signal\n", task->tk_pid); set_bit(RPC_TASK_SIGNALLED, &task->tk_runstate); + task->tk_rpc_status = -ERESTARTSYS; rpc_exit(task, -ERESTARTSYS); } dprintk("RPC: %5u sync task resuming\n", task->tk_pid);
From: Ryan Chen ryan_chen@aspeedtech.com
[ Upstream commit b3528b4874480818e38e4da019d655413c233e6a ]
The ast2600 can be supported by the same code as the ast2500.
Signed-off-by: Ryan Chen ryan_chen@aspeedtech.com Signed-off-by: Joel Stanley joel@jms.id.au Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20190819051738.17370-3-joel@jms.id.au Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/watchdog/aspeed_wdt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/watchdog/aspeed_wdt.c b/drivers/watchdog/aspeed_wdt.c index cc71861e033a5..5b64bc2e87888 100644 --- a/drivers/watchdog/aspeed_wdt.c +++ b/drivers/watchdog/aspeed_wdt.c @@ -34,6 +34,7 @@ static const struct aspeed_wdt_config ast2500_config = { static const struct of_device_id aspeed_wdt_of_table[] = { { .compatible = "aspeed,ast2400-wdt", .data = &ast2400_config }, { .compatible = "aspeed,ast2500-wdt", .data = &ast2500_config }, + { .compatible = "aspeed,ast2600-wdt", .data = &ast2500_config }, { }, }; MODULE_DEVICE_TABLE(of, aspeed_wdt_of_table); @@ -259,7 +260,8 @@ static int aspeed_wdt_probe(struct platform_device *pdev) set_bit(WDOG_HW_RUNNING, &wdt->wdd.status); }
- if (of_device_is_compatible(np, "aspeed,ast2500-wdt")) { + if ((of_device_is_compatible(np, "aspeed,ast2500-wdt")) || + (of_device_is_compatible(np, "aspeed,ast2600-wdt"))) { u32 reg = readl(wdt->base + WDT_RESET_WIDTH);
reg &= config->ext_pulse_width_mask;
From: Florian Westphal fw@strlen.de
[ Upstream commit acab713177377d9e0889c46bac7ff0cfb9a90c4d ]
This un-breaks lookups in sets that have the 'dynamic' flag set. Given this active example configuration:
table filter { set set1 { type ipv4_addr size 64 flags dynamic,timeout timeout 1m }
chain input { type filter hook input priority 0; policy accept; } }
... this works: nft add rule ip filter input add @set1 { ip saddr }
-> whenever rule is triggered, the source ip address is inserted into the set (if it did not exist).
This won't work: nft add rule ip filter input ip saddr @set1 counter Error: Could not process rule: Operation not supported
In other words, we can add entries to the set, but then can't make matching decision based on that set.
That is just wrong -- all set backends support lookups (else they would not be very useful). The failure comes from an explicit rejection in nft_lookup.c.
Looking at the history, it seems like NFT_SET_EVAL used to mean 'set contains expressions' (aka. "is a meter"), for instance something like
nft add rule ip filter input meter example { ip saddr limit rate 10/second } or nft add rule ip filter input meter example { ip saddr counter }
The actual meaning of NFT_SET_EVAL however, is 'set can be updated from the packet path'.
'meters' and packet-path insertions into sets, such as 'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c) and thus require a set backend that provides the ->update() function.
The only set that provides this also is the only one that has the NFT_SET_EVAL feature flag.
Removing the wrong check makes the above example work. While at it, also fix the flag check during set instantiation to allow supported combinations only.
Fixes: 8aeff920dcc9b3f ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 7 +++++-- net/netfilter/nft_lookup.c | 3 --- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index d47469f824a10..3b81323fa0171 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3562,8 +3562,11 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, NFT_SET_OBJECT)) return -EINVAL; /* Only one of these operations is supported */ - if ((flags & (NFT_SET_MAP | NFT_SET_EVAL | NFT_SET_OBJECT)) == - (NFT_SET_MAP | NFT_SET_EVAL | NFT_SET_OBJECT)) + if ((flags & (NFT_SET_MAP | NFT_SET_OBJECT)) == + (NFT_SET_MAP | NFT_SET_OBJECT)) + return -EOPNOTSUPP; + if ((flags & (NFT_SET_EVAL | NFT_SET_OBJECT)) == + (NFT_SET_EVAL | NFT_SET_OBJECT)) return -EOPNOTSUPP; }
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index c0560bf3c31bd..660bad688e2bc 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -73,9 +73,6 @@ static int nft_lookup_init(const struct nft_ctx *ctx, if (IS_ERR(set)) return PTR_ERR(set);
- if (set->flags & NFT_SET_EVAL) - return -EOPNOTSUPP; - priv->sreg = nft_parse_register(tb[NFTA_LOOKUP_SREG]); err = nft_validate_register_load(priv->sreg, set->klen); if (err < 0)
From: Felix Kuehling Felix.Kuehling@amd.com
[ Upstream commit dcafbd50f2e4d5cc964aae409fb5691b743fba23 ]
Hawaii needs to flush caches explicitly, submitting an IB in a user VMID from kernel mode. There is no s_fence in this case.
Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job") Signed-off-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 7850084a05e3a..60655834d6498 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -143,7 +143,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* ring tests don't use a job */ if (job) { vm = job->vm; - fence_ctx = job->base.s_fence->scheduled.context; + fence_ctx = job->base.s_fence ? + job->base.s_fence->scheduled.context : 0; } else { vm = NULL; fence_ctx = 0;
From: Trek trek00@inbox.ru
[ Upstream commit 73d8e6c7b841d9bf298c8928f228fb433676635c ]
Do not try to allocate any amount of memory requested by the user. Instead limit it to 128 registers. Actually the longest series of consecutive allowed registers are 48, mmGB_TILE_MODE0-31 and mmGB_MACROTILE_MODE0-15 (0x2644-0x2673).
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=111273 Signed-off-by: Trek trek00@inbox.ru Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 0cf7e8606fd3d..00beba533582c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -662,6 +662,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file if (sh_num == AMDGPU_INFO_MMR_SH_INDEX_MASK) sh_num = 0xffffffff;
+ if (info->read_mmr_reg.count > 128) + return -EINVAL; + regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); if (!regs) return -ENOMEM;
From: Masami Hiramatsu mhiramat@kernel.org
[ Upstream commit 9e6124d9d635957b56717f85219a88701617253f ]
Since add_probe_trace_event() can reuse tf->tevs[i] after calling clear_probe_trace_event(), this can make perf-probe crash if the 1st attempt of probe event finding fails to find an event argument, and the 2nd attempt fails to find probe point.
E.g. $ perf probe -D "task_pid_nr tsk" Failed to find 'tsk' in this function. Failed to get entry address of warn_bad_vsyscall Segmentation fault (core dumped)
Committer testing:
After the patch:
$ perf probe -D "task_pid_nr tsk" Failed to find 'tsk' in this function. Failed to get entry address of warn_bad_vsyscall Failed to get entry address of signal_fault Failed to get entry address of show_signal Failed to get entry address of umip_printk Failed to get entry address of __bad_area_nosemaphore <SNIP> Failed to get entry address of sock_set_timeout Failed to get entry address of tcp_recvmsg Probe point 'task_pid_nr' not found. Error: Failed to add events. $
Fixes: 092b1f0b5f9f ("perf probe: Clear probe_trace_event when add_probe_trace_event() fails") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Wang Nan wangnan0@huawei.com Link: http://lore.kernel.org/lkml/156856587999.25775.5145779959474477595.stgit@dev... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/probe-event.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 8394d48f8b32e..3355c445abedf 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -2329,6 +2329,7 @@ void clear_probe_trace_event(struct probe_trace_event *tev) } } zfree(&tev->args); + tev->nargs = 0; }
struct kprobe_blacklist_node {
From: Trond Myklebust trondmy@gmail.com
[ Upstream commit 9c47b18cf722184f32148784189fca945a7d0561 ]
IF the server rejected our layout return with a state error such as NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want to clear out all the remaining layout segments and mark that stateid as invalid.
Fixes: 1c5bd76d17cca ("pNFS: Enable layoutreturn operation for...") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/pnfs.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 4525d5acae386..0418b198edd3e 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -1449,10 +1449,15 @@ void pnfs_roc_release(struct nfs4_layoutreturn_args *args, const nfs4_stateid *res_stateid = NULL; struct nfs4_xdr_opaque_data *ld_private = args->ld_private;
- if (ret == 0) { - arg_stateid = &args->stateid; + switch (ret) { + case -NFS4ERR_NOMATCHING_LAYOUT: + break; + case 0: if (res->lrs_present) res_stateid = &res->stateid; + /* Fallthrough */ + default: + arg_stateid = &args->stateid; } pnfs_layoutreturn_free_lsegs(lo, arg_stateid, &args->range, res_stateid);
From: Trond Myklebust trondmy@gmail.com
[ Upstream commit 9ba828861c56a21d211d5d10f5643774b1ea330d ]
If the copy of the RPC reply into our buffers did not complete, and we could end up with a truncated message. In that case, just resend the call.
Fixes: a0584ee9aed80 ("SUNRPC: Use struct xdr_stream when decoding...") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/clnt.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index e7fdc400506e8..f7f78566be463 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2482,6 +2482,7 @@ call_decode(struct rpc_task *task) struct rpc_clnt *clnt = task->tk_client; struct rpc_rqst *req = task->tk_rqstp; struct xdr_stream xdr; + int err;
dprint_status(task);
@@ -2504,6 +2505,15 @@ call_decode(struct rpc_task *task) * before it changed req->rq_reply_bytes_recvd. */ smp_rmb(); + + /* + * Did we ever call xprt_complete_rqst()? If not, we should assume + * the message is incomplete. + */ + err = -EAGAIN; + if (!req->rq_reply_bytes_recvd) + goto out; + req->rq_rcv_buf.len = req->rq_private_buf.len;
/* Check that the softirq receive buffer is valid */ @@ -2512,7 +2522,9 @@ call_decode(struct rpc_task *task)
xdr_init_decode(&xdr, &req->rq_rcv_buf, req->rq_rcv_buf.head[0].iov_base, req); - switch (rpc_decode_header(task, &xdr)) { + err = rpc_decode_header(task, &xdr); +out: + switch (err) { case 0: task->tk_action = rpc_exit_task; task->tk_status = rpcauth_unwrap_resp(task, &xdr);
From: Fabrice Gasnier fabrice.gasnier@st.com
[ Upstream commit c91e3234c6035baf5a79763cb4fcd5d23ce75c2b ]
LPTimer can use a 32KHz clock for counting. It depends on clock tree configuration. In such a case, PWM output frequency range is limited. Although unlikely, nothing prevents user from requesting a PWM frequency above counting clock (32KHz for instance): - This causes (prd - 1) = 0xffff to be written in ARR register later in the apply() routine. This results in badly configured PWM period (and also duty_cycle). Add a check to report an error is such a case.
Signed-off-by: Fabrice Gasnier fabrice.gasnier@st.com Reviewed-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pwm/pwm-stm32-lp.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/pwm/pwm-stm32-lp.c b/drivers/pwm/pwm-stm32-lp.c index 2211a642066db..97a9afa191ee0 100644 --- a/drivers/pwm/pwm-stm32-lp.c +++ b/drivers/pwm/pwm-stm32-lp.c @@ -59,6 +59,12 @@ static int stm32_pwm_lp_apply(struct pwm_chip *chip, struct pwm_device *pwm, /* Calculate the period and prescaler value */ div = (unsigned long long)clk_get_rate(priv->clk) * state->period; do_div(div, NSEC_PER_SEC); + if (!div) { + /* Clock is too slow to achieve requested period. */ + dev_dbg(priv->chip.dev, "Can't reach %u ns\n", state->period); + return -EINVAL; + } + prd = div; while (div > STM32_LPTIM_MAX_ARR) { presc++;
From: Tycho Andersen tycho@tycho.ws
[ Upstream commit 88282297fff00796e81f5e67734a6afdfb31fbc4 ]
The seccomp selftest goes to some length to build against older kernel headers, viz. all the #ifdefs at the beginning of the file.
Commit 201766a20e30 ("ptrace: add PTRACE_GET_SYSCALL_INFO request") introduces some additional macros, but doesn't do the #ifdef dance. Let's add that dance here to avoid:
gcc -Wl,-no-as-needed -Wall seccomp_bpf.c -lpthread -o seccomp_bpf In file included from seccomp_bpf.c:51: seccomp_bpf.c: In function ‘tracer_ptrace’: seccomp_bpf.c:1787:20: error: ‘PTRACE_EVENTMSG_SYSCALL_ENTRY’ undeclared (first use in this function); did you mean ‘PTRACE_EVENT_CLONE’? EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’ __typeof__(_expected) __exp = (_expected); \ ^~~~~~~~~ seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’ EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^~~~~~~~~ seccomp_bpf.c:1787:20: note: each undeclared identifier is reported only once for each function it appears in EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’ __typeof__(_expected) __exp = (_expected); \ ^~~~~~~~~ seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’ EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^~~~~~~~~ seccomp_bpf.c:1788:6: error: ‘PTRACE_EVENTMSG_SYSCALL_EXIT’ undeclared (first use in this function); did you mean ‘PTRACE_EVENT_EXIT’? : PTRACE_EVENTMSG_SYSCALL_EXIT, msg); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’ __typeof__(_expected) __exp = (_expected); \ ^~~~~~~~~ seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’ EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^~~~~~~~~ make: *** [Makefile:12: seccomp_bpf] Error 1
[skhan@linuxfoundation.org: Fix checkpatch error in commit log] Signed-off-by: Tycho Andersen tycho@tycho.ws Fixes: 201766a20e30 ("ptrace: add PTRACE_GET_SYSCALL_INFO request") Acked-by: Kees Cook keescook@chromium.org Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/seccomp/seccomp_bpf.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 6ef7f16c4cf52..7f8b5c8982e3b 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -199,6 +199,11 @@ struct seccomp_notif_sizes { }; #endif
+#ifndef PTRACE_EVENTMSG_SYSCALL_ENTRY +#define PTRACE_EVENTMSG_SYSCALL_ENTRY 1 +#define PTRACE_EVENTMSG_SYSCALL_EXIT 2 +#endif + #ifndef seccomp int seccomp(unsigned int op, unsigned int flags, void *args) {
From: Arvind Sankar nivedita@alum.mit.edu
[ Upstream commit ca14c996afe7228ff9b480cf225211cc17212688 ]
Since commit:
b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS")
kexec breaks if GCC_PLUGIN_STACKLEAK=y is enabled, as the purgatory contains undefined references to stackleak_track_stack.
Attempting to load a kexec kernel results in this failure:
kexec: Undefined symbol: stackleak_track_stack kexec-bzImage64: Loading purgatory failed
Fix this by disabling the stackleak plugin for the purgatory.
Signed-off-by: Arvind Sankar nivedita@alum.mit.edu Reviewed-by: Nick Desaulniers ndesaulniers@google.com Cc: Borislav Petkov bp@alien8.de Cc: H. Peter Anvin hpa@zytor.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS") Link: https://lkml.kernel.org/r/20190923171753.GA2252517@rani.riverdale.lan Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/purgatory/Makefile | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile index 10fb42da0007e..b81b5172cf994 100644 --- a/arch/x86/purgatory/Makefile +++ b/arch/x86/purgatory/Makefile @@ -23,6 +23,7 @@ KCOV_INSTRUMENT := n
PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss +PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN)
# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That # in turn leaves some undefined symbols like __fentry__ in purgatory and not
From: Sanjay R Mehta sanju.mehta@amd.com
[ Upstream commit ae89339b08f3fe02457ec9edd512ddc3d246d0f8 ]
second parameter of ntb_peer_mw_get_addr is pointing to wrong memory window index by passing "peer gidx" instead of "local gidx".
For ex, "local gidx" value is '0' and "peer gidx" value is '1', then
on peer side ntb_mw_set_trans() api is used as below with gidx pointing to local side gidx which is '0', so memroy window '0' is chosen and XLAT '0' will be programmed by peer side.
ntb_mw_set_trans(perf->ntb, peer->pidx, peer->gidx, peer->inbuf_xlat, peer->inbuf_size);
Now, on local side ntb_peer_mw_get_addr() is been used as below with gidx pointing to "peer gidx" which is '1', so pointing to memory window '1' instead of memory window '0'.
ntb_peer_mw_get_addr(perf->ntb, peer->gidx, &phys_addr, &peer->outbuf_size);
So this patch pass "local gidx" as parameter to ntb_peer_mw_get_addr().
Signed-off-by: Sanjay R Mehta sanju.mehta@amd.com Signed-off-by: Jon Mason jdmason@kudzu.us Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ntb/test/ntb_perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c index d028331558ea7..e9b7c2dfc7301 100644 --- a/drivers/ntb/test/ntb_perf.c +++ b/drivers/ntb/test/ntb_perf.c @@ -1378,7 +1378,7 @@ static int perf_setup_peer_mw(struct perf_peer *peer) int ret;
/* Get outbound MW parameters and map it */ - ret = ntb_peer_mw_get_addr(perf->ntb, peer->gidx, &phys_addr, + ret = ntb_peer_mw_get_addr(perf->ntb, perf->gidx, &phys_addr, &peer->outbuf_size); if (ret) return ret;
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 ]
thermal_zone_device_unregister() cancels the delayed work that polls the thermal zone, but it does not wait for it to finish. This is racy with respect to the freeing of the thermal zone device, which can result in a use-after-free [1].
Fix this by waiting for the delayed work to finish before freeing the thermal zone device. Note that thermal_zone_device_set_polling() is never invoked from an atomic context, so it is safe to call cancel_delayed_work_sync() that can block.
[1] [ +0.002221] ================================================================== [ +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0 [ +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17
[ +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701 [ +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check [ +0.000012] Call Trace: [ +0.000021] dump_stack+0xa9/0x10e [ +0.000020] print_address_description.cold.2+0x9/0x25e [ +0.000018] __kasan_report.cold.3+0x78/0x9d [ +0.000016] kasan_report+0xe/0x20 [ +0.000016] __mutex_lock+0x1076/0x11c0 [ +0.000014] step_wise_throttle+0x72/0x150 [ +0.000018] handle_thermal_trip+0x167/0x760 [ +0.000019] thermal_zone_device_update+0x19e/0x5f0 [ +0.000019] process_one_work+0x969/0x16f0 [ +0.000017] worker_thread+0x91/0xc40 [ +0.000014] kthread+0x33d/0x400 [ +0.000015] ret_from_fork+0x3a/0x50
[ +0.000020] Allocated by task 1: [ +0.000015] save_stack+0x19/0x80 [ +0.000015] __kasan_kmalloc.constprop.4+0xc1/0xd0 [ +0.000014] kmem_cache_alloc_trace+0x152/0x320 [ +0.000015] thermal_zone_device_register+0x1b4/0x13a0 [ +0.000015] mlxsw_thermal_init+0xc92/0x23d0 [ +0.000014] __mlxsw_core_bus_device_register+0x659/0x11b0 [ +0.000013] mlxsw_core_bus_device_register+0x3d/0x90 [ +0.000013] mlxsw_pci_probe+0x355/0x4b0 [ +0.000014] local_pci_probe+0xc3/0x150 [ +0.000013] pci_device_probe+0x280/0x410 [ +0.000013] really_probe+0x26a/0xbb0 [ +0.000013] driver_probe_device+0x208/0x2e0 [ +0.000013] device_driver_attach+0xfe/0x140 [ +0.000013] __driver_attach+0x110/0x310 [ +0.000013] bus_for_each_dev+0x14b/0x1d0 [ +0.000013] driver_register+0x1c0/0x400 [ +0.000015] mlxsw_sp_module_init+0x5d/0xd3 [ +0.000014] do_one_initcall+0x239/0x4dd [ +0.000013] kernel_init_freeable+0x42b/0x4e8 [ +0.000012] kernel_init+0x11/0x18b [ +0.000013] ret_from_fork+0x3a/0x50
[ +0.000015] Freed by task 581: [ +0.000013] save_stack+0x19/0x80 [ +0.000014] __kasan_slab_free+0x125/0x170 [ +0.000013] kfree+0xf3/0x310 [ +0.000013] thermal_release+0xc7/0xf0 [ +0.000014] device_release+0x77/0x200 [ +0.000014] kobject_put+0x1a8/0x4c0 [ +0.000014] device_unregister+0x38/0xc0 [ +0.000014] thermal_zone_device_unregister+0x54e/0x6a0 [ +0.000014] mlxsw_thermal_fini+0x184/0x35a [ +0.000014] mlxsw_core_bus_device_unregister+0x10a/0x640 [ +0.000013] mlxsw_devlink_core_bus_device_reload+0x92/0x210 [ +0.000015] devlink_nl_cmd_reload+0x113/0x1f0 [ +0.000014] genl_family_rcv_msg+0x700/0xee0 [ +0.000013] genl_rcv_msg+0xca/0x170 [ +0.000013] netlink_rcv_skb+0x137/0x3a0 [ +0.000012] genl_rcv+0x29/0x40 [ +0.000013] netlink_unicast+0x49b/0x660 [ +0.000013] netlink_sendmsg+0x755/0xc90 [ +0.000013] __sys_sendto+0x3de/0x430 [ +0.000013] __x64_sys_sendto+0xe2/0x1b0 [ +0.000013] do_syscall_64+0xa4/0x4d0 [ +0.000013] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000017] The buggy address belongs to the object at ffff8881e48e0008 which belongs to the cache kmalloc-2k of size 2048 [ +0.000012] The buggy address is located 1096 bytes inside of 2048-byte region [ffff8881e48e0008, ffff8881e48e0808) [ +0.000007] The buggy address belongs to the page: [ +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0 [ +0.000020] flags: 0x200000000010200(slab|head) [ +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0 [ +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 [ +0.000007] page dumped because: kasan: bad access detected
[ +0.000012] Memory state around the buggy address: [ +0.000012] ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000008] ^ [ +0.000012] ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000007] ==================================================================
Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer") Reported-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Ido Schimmel idosch@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 6bab66e84eb58..ebe15f2cf7fc3 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -304,7 +304,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz, &tz->poll_queue, msecs_to_jiffies(delay)); else - cancel_delayed_work(&tz->poll_queue); + cancel_delayed_work_sync(&tz->poll_queue); }
static void monitor_thermal_zone(struct thermal_zone_device *tz)
Hi Greg,
This patch causes a deadlock, the thermal framework stops. Please check this fix (I found it before posting exactly the same solution): https://lkml.org/lkml/2019/11/12/1075
I have verified it on OdroidXU4 and it works. It needs some cleaning in the commit message, though. I have added to CC the author: Wei Wang
I don't know in how many stable trees it was applied, but should be fix there also.
Regards, Lukasz Luba
On 10/10/19 9:36 AM, Greg Kroah-Hartman wrote:
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 ]
thermal_zone_device_unregister() cancels the delayed work that polls the thermal zone, but it does not wait for it to finish. This is racy with respect to the freeing of the thermal zone device, which can result in a use-after-free [1].
Fix this by waiting for the delayed work to finish before freeing the thermal zone device. Note that thermal_zone_device_set_polling() is never invoked from an atomic context, so it is safe to call cancel_delayed_work_sync() that can block.
[1] [ +0.002221] ================================================================== [ +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0 [ +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17
[ +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701 [ +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check [ +0.000012] Call Trace: [ +0.000021] dump_stack+0xa9/0x10e [ +0.000020] print_address_description.cold.2+0x9/0x25e [ +0.000018] __kasan_report.cold.3+0x78/0x9d [ +0.000016] kasan_report+0xe/0x20 [ +0.000016] __mutex_lock+0x1076/0x11c0 [ +0.000014] step_wise_throttle+0x72/0x150 [ +0.000018] handle_thermal_trip+0x167/0x760 [ +0.000019] thermal_zone_device_update+0x19e/0x5f0 [ +0.000019] process_one_work+0x969/0x16f0 [ +0.000017] worker_thread+0x91/0xc40 [ +0.000014] kthread+0x33d/0x400 [ +0.000015] ret_from_fork+0x3a/0x50
[ +0.000020] Allocated by task 1: [ +0.000015] save_stack+0x19/0x80 [ +0.000015] __kasan_kmalloc.constprop.4+0xc1/0xd0 [ +0.000014] kmem_cache_alloc_trace+0x152/0x320 [ +0.000015] thermal_zone_device_register+0x1b4/0x13a0 [ +0.000015] mlxsw_thermal_init+0xc92/0x23d0 [ +0.000014] __mlxsw_core_bus_device_register+0x659/0x11b0 [ +0.000013] mlxsw_core_bus_device_register+0x3d/0x90 [ +0.000013] mlxsw_pci_probe+0x355/0x4b0 [ +0.000014] local_pci_probe+0xc3/0x150 [ +0.000013] pci_device_probe+0x280/0x410 [ +0.000013] really_probe+0x26a/0xbb0 [ +0.000013] driver_probe_device+0x208/0x2e0 [ +0.000013] device_driver_attach+0xfe/0x140 [ +0.000013] __driver_attach+0x110/0x310 [ +0.000013] bus_for_each_dev+0x14b/0x1d0 [ +0.000013] driver_register+0x1c0/0x400 [ +0.000015] mlxsw_sp_module_init+0x5d/0xd3 [ +0.000014] do_one_initcall+0x239/0x4dd [ +0.000013] kernel_init_freeable+0x42b/0x4e8 [ +0.000012] kernel_init+0x11/0x18b [ +0.000013] ret_from_fork+0x3a/0x50
[ +0.000015] Freed by task 581: [ +0.000013] save_stack+0x19/0x80 [ +0.000014] __kasan_slab_free+0x125/0x170 [ +0.000013] kfree+0xf3/0x310 [ +0.000013] thermal_release+0xc7/0xf0 [ +0.000014] device_release+0x77/0x200 [ +0.000014] kobject_put+0x1a8/0x4c0 [ +0.000014] device_unregister+0x38/0xc0 [ +0.000014] thermal_zone_device_unregister+0x54e/0x6a0 [ +0.000014] mlxsw_thermal_fini+0x184/0x35a [ +0.000014] mlxsw_core_bus_device_unregister+0x10a/0x640 [ +0.000013] mlxsw_devlink_core_bus_device_reload+0x92/0x210 [ +0.000015] devlink_nl_cmd_reload+0x113/0x1f0 [ +0.000014] genl_family_rcv_msg+0x700/0xee0 [ +0.000013] genl_rcv_msg+0xca/0x170 [ +0.000013] netlink_rcv_skb+0x137/0x3a0 [ +0.000012] genl_rcv+0x29/0x40 [ +0.000013] netlink_unicast+0x49b/0x660 [ +0.000013] netlink_sendmsg+0x755/0xc90 [ +0.000013] __sys_sendto+0x3de/0x430 [ +0.000013] __x64_sys_sendto+0xe2/0x1b0 [ +0.000013] do_syscall_64+0xa4/0x4d0 [ +0.000013] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000017] The buggy address belongs to the object at ffff8881e48e0008 which belongs to the cache kmalloc-2k of size 2048 [ +0.000012] The buggy address is located 1096 bytes inside of 2048-byte region [ffff8881e48e0008, ffff8881e48e0808) [ +0.000007] The buggy address belongs to the page: [ +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0 [ +0.000020] flags: 0x200000000010200(slab|head) [ +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0 [ +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 [ +0.000007] page dumped because: kasan: bad access detected
[ +0.000012] Memory state around the buggy address: [ +0.000012] ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000008] ^ [ +0.000012] ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000007] ==================================================================
Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer") Reported-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Ido Schimmel idosch@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/thermal/thermal_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 6bab66e84eb58..ebe15f2cf7fc3 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -304,7 +304,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz, &tz->poll_queue, msecs_to_jiffies(delay)); else
cancel_delayed_work(&tz->poll_queue);
}cancel_delayed_work_sync(&tz->poll_queue);
static void monitor_thermal_zone(struct thermal_zone_device *tz)
On Thu, Nov 14, 2019 at 01:17:55PM +0000, Lukasz Luba wrote:
Hi Greg,
This patch causes a deadlock, the thermal framework stops. Please check this fix (I found it before posting exactly the same solution): https://lkml.org/lkml/2019/11/12/1075
Sorry about that.
I have verified it on OdroidXU4 and it works. It needs some cleaning in the commit message, though. I have added to CC the author: Wei Wang
I don't know in how many stable trees it was applied, but should be fix there also.
I checked my mailbox and it seems it was backported to: 4.4, 4.9, 4.14, 4.19, 5.2 (EOL) and 5.3
Thanks
On Thu, 2019-11-14 at 13:17 +0000, Lukasz Luba wrote:
Hi Greg,
This patch causes a deadlock, the thermal framework stops. Please check this fix (I found it before posting exactly the same solution): https://lkml.org/lkml/2019/11/12/1075
I have verified it on OdroidXU4 and it works. It needs some cleaning in the commit message, though. I have added to CC the author: Wei Wang
I don't know in how many stable trees it was applied, but should be fix there also.
Right. I've just applied this in thermal tree, will check if it can get 5.4 or not later.
thanks, rui
Regards, Lukasz Luba
On 10/10/19 9:36 AM, Greg Kroah-Hartman wrote:
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 ]
thermal_zone_device_unregister() cancels the delayed work that polls the thermal zone, but it does not wait for it to finish. This is racy with respect to the freeing of the thermal zone device, which can result in a use-after-free [1].
Fix this by waiting for the delayed work to finish before freeing the thermal zone device. Note that thermal_zone_device_set_polling() is never invoked from an atomic context, so it is safe to call cancel_delayed_work_sync() that can block.
[1] [ +0.002221] ================================================================== [ +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0 [ +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17
[ +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0- rc6-custom-02495-g8e73ca3be4af #1701 [ +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100- CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check [ +0.000012] Call Trace: [ +0.000021] dump_stack+0xa9/0x10e [ +0.000020] print_address_description.cold.2+0x9/0x25e [ +0.000018] __kasan_report.cold.3+0x78/0x9d [ +0.000016] kasan_report+0xe/0x20 [ +0.000016] __mutex_lock+0x1076/0x11c0 [ +0.000014] step_wise_throttle+0x72/0x150 [ +0.000018] handle_thermal_trip+0x167/0x760 [ +0.000019] thermal_zone_device_update+0x19e/0x5f0 [ +0.000019] process_one_work+0x969/0x16f0 [ +0.000017] worker_thread+0x91/0xc40 [ +0.000014] kthread+0x33d/0x400 [ +0.000015] ret_from_fork+0x3a/0x50
[ +0.000020] Allocated by task 1: [ +0.000015] save_stack+0x19/0x80 [ +0.000015] __kasan_kmalloc.constprop.4+0xc1/0xd0 [ +0.000014] kmem_cache_alloc_trace+0x152/0x320 [ +0.000015] thermal_zone_device_register+0x1b4/0x13a0 [ +0.000015] mlxsw_thermal_init+0xc92/0x23d0 [ +0.000014] __mlxsw_core_bus_device_register+0x659/0x11b0 [ +0.000013] mlxsw_core_bus_device_register+0x3d/0x90 [ +0.000013] mlxsw_pci_probe+0x355/0x4b0 [ +0.000014] local_pci_probe+0xc3/0x150 [ +0.000013] pci_device_probe+0x280/0x410 [ +0.000013] really_probe+0x26a/0xbb0 [ +0.000013] driver_probe_device+0x208/0x2e0 [ +0.000013] device_driver_attach+0xfe/0x140 [ +0.000013] __driver_attach+0x110/0x310 [ +0.000013] bus_for_each_dev+0x14b/0x1d0 [ +0.000013] driver_register+0x1c0/0x400 [ +0.000015] mlxsw_sp_module_init+0x5d/0xd3 [ +0.000014] do_one_initcall+0x239/0x4dd [ +0.000013] kernel_init_freeable+0x42b/0x4e8 [ +0.000012] kernel_init+0x11/0x18b [ +0.000013] ret_from_fork+0x3a/0x50
[ +0.000015] Freed by task 581: [ +0.000013] save_stack+0x19/0x80 [ +0.000014] __kasan_slab_free+0x125/0x170 [ +0.000013] kfree+0xf3/0x310 [ +0.000013] thermal_release+0xc7/0xf0 [ +0.000014] device_release+0x77/0x200 [ +0.000014] kobject_put+0x1a8/0x4c0 [ +0.000014] device_unregister+0x38/0xc0 [ +0.000014] thermal_zone_device_unregister+0x54e/0x6a0 [ +0.000014] mlxsw_thermal_fini+0x184/0x35a [ +0.000014] mlxsw_core_bus_device_unregister+0x10a/0x640 [ +0.000013] mlxsw_devlink_core_bus_device_reload+0x92/0x210 [ +0.000015] devlink_nl_cmd_reload+0x113/0x1f0 [ +0.000014] genl_family_rcv_msg+0x700/0xee0 [ +0.000013] genl_rcv_msg+0xca/0x170 [ +0.000013] netlink_rcv_skb+0x137/0x3a0 [ +0.000012] genl_rcv+0x29/0x40 [ +0.000013] netlink_unicast+0x49b/0x660 [ +0.000013] netlink_sendmsg+0x755/0xc90 [ +0.000013] __sys_sendto+0x3de/0x430 [ +0.000013] __x64_sys_sendto+0xe2/0x1b0 [ +0.000013] do_syscall_64+0xa4/0x4d0 [ +0.000013] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ +0.000017] The buggy address belongs to the object at ffff8881e48e0008 which belongs to the cache kmalloc-2k of size 2048 [ +0.000012] The buggy address is located 1096 bytes inside of 2048-byte region [ffff8881e48e0008, ffff8881e48e0808) [ +0.000007] The buggy address belongs to the page: [ +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0 [ +0.000020] flags: 0x200000000010200(slab|head) [ +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0 [ +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 [ +0.000007] page dumped because: kasan: bad access detected
[ +0.000012] Memory state around the buggy address: [ +0.000012] ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000008] ^ [ +0.000012] ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000007] ==================================================================
Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer") Reported-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Ido Schimmel idosch@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/thermal/thermal_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 6bab66e84eb58..ebe15f2cf7fc3 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -304,7 +304,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz, &tz->poll_queue, msecs_to_jiffies(delay)); else
cancel_delayed_work(&tz->poll_queue);
}cancel_delayed_work_sync(&tz->poll_queue);
static void monitor_thermal_zone(struct thermal_zone_device *tz)
From: Stefan Mavrodiev stefan@olimex.com
[ Upstream commit 8c7aa184281c01fc26f319059efb94725012921d ]
When calling thermal_add_hwmon_sysfs(), the device type is sanitized by replacing '-' with '_'. However tz->type remains unsanitized. Thus calling thermal_hwmon_lookup_by_type() returns no device. And if there is no device, thermal_remove_hwmon_sysfs() fails with "hwmon device lookup failed!".
The result is unregisted hwmon devices in the sysfs.
Fixes: 409ef0bacacf ("thermal_hwmon: Sanitize attribute name passed to hwmon")
Signed-off-by: Stefan Mavrodiev stefan@olimex.com Signed-off-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_hwmon.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c index 40c69a533b240..dd5d8ee379287 100644 --- a/drivers/thermal/thermal_hwmon.c +++ b/drivers/thermal/thermal_hwmon.c @@ -87,13 +87,17 @@ static struct thermal_hwmon_device * thermal_hwmon_lookup_by_type(const struct thermal_zone_device *tz) { struct thermal_hwmon_device *hwmon; + char type[THERMAL_NAME_LENGTH];
mutex_lock(&thermal_hwmon_list_lock); - list_for_each_entry(hwmon, &thermal_hwmon_list, node) - if (!strcmp(hwmon->type, tz->type)) { + list_for_each_entry(hwmon, &thermal_hwmon_list, node) { + strcpy(type, tz->type); + strreplace(type, '-', '_'); + if (!strcmp(hwmon->type, type)) { mutex_unlock(&thermal_hwmon_list_lock); return hwmon; } + } mutex_unlock(&thermal_hwmon_list_lock);
return NULL;
From: Andrei Dulea adulea@amazon.de
[ Upstream commit 6ccb72f8374e17d60b58a7bfd5570496332c54e2 ]
Downgrading an existing large mapping to a mapping using smaller page-sizes works only for the mappings created with page-mode 7 (i.e. non-default page size).
Treat large mappings created with page-mode 0 (i.e. default page size) like a non-present mapping and allow to overwrite it in alloc_pte().
While around, make sure that we flush the TLB only if we change an existing mapping, otherwise we might end up acting on garbage PTEs.
Fixes: 6d568ef9a622 ("iommu/amd: Allow downgrading page-sizes in alloc_pte()") Signed-off-by: Andrei Dulea adulea@amazon.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/amd_iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index e1259429ded2f..3b1d7ae6f75e0 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1490,6 +1490,7 @@ static u64 *alloc_pte(struct protection_domain *domain, pte_level = PM_PTE_LEVEL(__pte);
if (!IOMMU_PTE_PRESENT(__pte) || + pte_level == PAGE_MODE_NONE || pte_level == PAGE_MODE_7_LEVEL) { page = (u64 *)get_zeroed_page(gfp); if (!page) @@ -1500,7 +1501,7 @@ static u64 *alloc_pte(struct protection_domain *domain, /* pte could have been changed somewhere. */ if (cmpxchg64(pte, __pte, __npte) != __pte) free_page((unsigned long)page); - else if (pte_level == PAGE_MODE_7_LEVEL) + else if (IOMMU_PTE_PRESENT(__pte)) domain->updated = true;
continue;
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit c42adf87e4e7ed77f6ffe288dc90f980d07d68df ]
We do check for a bad block during namespace init and that use region bad block list. We need to initialize the bad block for volatile regions for this to work. We also observe a lockdep warning as below because the lock is not initialized correctly since we skip bad block init for volatile regions.
INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149 Call Trace: [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable) [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60 [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0 [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270 [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290 [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0 [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0 [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160 [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240 [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0 [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0 [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0 [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0 [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130 [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50 [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0 [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170 [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100 [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48 [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0 [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180 [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvdimm/bus.c | 2 +- drivers/nvdimm/region.c | 4 ++-- drivers/nvdimm/region_devs.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 798c5c4aea9ca..bb3f20ebc276d 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -182,7 +182,7 @@ static int nvdimm_clear_badblocks_region(struct device *dev, void *data) sector_t sector;
/* make sure device is a region */ - if (!is_nd_pmem(dev)) + if (!is_memory(dev)) return 0;
nd_region = to_nd_region(dev); diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c index 37bf8719a2a44..0f6978e72e7cd 100644 --- a/drivers/nvdimm/region.c +++ b/drivers/nvdimm/region.c @@ -34,7 +34,7 @@ static int nd_region_probe(struct device *dev) if (rc) return rc;
- if (is_nd_pmem(&nd_region->dev)) { + if (is_memory(&nd_region->dev)) { struct resource ndr_res;
if (devm_init_badblocks(dev, &nd_region->bb)) @@ -123,7 +123,7 @@ static void nd_region_notify(struct device *dev, enum nvdimm_event event) struct nd_region *nd_region = to_nd_region(dev); struct resource res;
- if (is_nd_pmem(&nd_region->dev)) { + if (is_memory(&nd_region->dev)) { res.start = nd_region->ndr_start; res.end = nd_region->ndr_start + nd_region->ndr_size - 1; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index af30cbe7a8ea2..47b48800fb758 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -632,11 +632,11 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n) if (!is_memory(dev) && a == &dev_attr_dax_seed.attr) return 0;
- if (!is_nd_pmem(dev) && a == &dev_attr_badblocks.attr) + if (!is_memory(dev) && a == &dev_attr_badblocks.attr) return 0;
if (a == &dev_attr_resource.attr) { - if (is_nd_pmem(dev)) + if (is_memory(dev)) return 0400; else return 0;
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 86aa66687442ef45909ff9814b82b4d2bb892294 ]
nd_label->dpa issue was observed when trying to enable the namespace created with little-endian kernel on a big-endian kernel. That made me run `sparse` on the rest of the code and other changes are the result of that.
Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing") Fixes: 9dedc73a4658 ("libnvdimm/btt: Fix LBA masking during 'free list' population") Reviewed-by: Vishal Verma vishal.l.verma@intel.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/r/20190809074726.27815-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvdimm/btt.c | 8 ++++---- drivers/nvdimm/namespace_devs.c | 7 ++++--- 2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index a8d56887ec881..3e9f45aec8d18 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -392,9 +392,9 @@ static int btt_flog_write(struct arena_info *arena, u32 lane, u32 sub, arena->freelist[lane].sub = 1 - arena->freelist[lane].sub; if (++(arena->freelist[lane].seq) == 4) arena->freelist[lane].seq = 1; - if (ent_e_flag(ent->old_map)) + if (ent_e_flag(le32_to_cpu(ent->old_map))) arena->freelist[lane].has_err = 1; - arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map)); + arena->freelist[lane].block = ent_lba(le32_to_cpu(ent->old_map));
return ret; } @@ -560,8 +560,8 @@ static int btt_freelist_init(struct arena_info *arena) * FIXME: if error clearing fails during init, we want to make * the BTT read-only */ - if (ent_e_flag(log_new.old_map) && - !ent_normal(log_new.old_map)) { + if (ent_e_flag(le32_to_cpu(log_new.old_map)) && + !ent_normal(le32_to_cpu(log_new.old_map))) { arena->freelist[i].has_err = 1; ret = arena_clear_freelist_error(arena, i); if (ret) diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index a16e52251a305..102c9d5141ee8 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1987,7 +1987,7 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region, nd_mapping = &nd_region->mapping[i]; label_ent = list_first_entry_or_null(&nd_mapping->labels, typeof(*label_ent), list); - label0 = label_ent ? label_ent->label : 0; + label0 = label_ent ? label_ent->label : NULL;
if (!label0) { WARN_ON(1); @@ -2322,8 +2322,9 @@ static struct device **scan_labels(struct nd_region *nd_region) continue;
/* skip labels that describe extents outside of the region */ - if (nd_label->dpa < nd_mapping->start || nd_label->dpa > map_end) - continue; + if (__le64_to_cpu(nd_label->dpa) < nd_mapping->start || + __le64_to_cpu(nd_label->dpa) > map_end) + continue;
i = add_namespace_resource(nd_region, nd_label, devs, count); if (i < 0)
From: zhengbin zhengbin13@huawei.com
[ Upstream commit 9ad09b1976c562061636ff1e01bfc3a57aebe56b ]
If cuse_send_init fails, need to fuse_conn_put cc->fc.
cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1) ->fuse_dev_alloc->fuse_conn_get ->fuse_dev_free->fuse_conn_put
Fixes: cc080e9e9be1 ("fuse: introduce per-instance fuse_dev structure") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: zhengbin zhengbin13@huawei.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fuse/cuse.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index bab7a0db81dd4..f3b7208846506 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -519,6 +519,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file) rc = cuse_send_init(cc); if (rc) { fuse_dev_free(fud); + fuse_conn_put(&cc->fc); return rc; } file->private_data = fud;
From: Nathan Chancellor natechancellor@gmail.com
[ Upstream commit 59f08896f058a92f03a0041b397a1a227c5e8529 ]
After commit 62974fc389b3 ("libnvdimm: Enable unit test infrastructure compile checks"), clang warns:
In file included from ../drivers/nvdimm/../../tools/testing/nvdimm/test/iomap.c:15: ../drivers/nvdimm/../../tools/testing/nvdimm/test/nfit_test.h:206:15: warning: redefinition of typedef 'acpi_handle' is a C11 feature [-Wtypedef-redefinition] typedef void *acpi_handle; ^ ../include/acpi/actypes.h:424:15: note: previous definition is here typedef void *acpi_handle; /* Actually a ptr to a NS Node */ ^ 1 warning generated.
The include chain:
iomap.c -> linux/acpi.h -> acpi/acpi.h -> acpi/actypes.h nfit_test.h
Avoid this by including linux/acpi.h in nfit_test.h, which allows us to remove both the typedef and the forward declaration of acpi_object.
Link: https://github.com/ClangBuiltLinux/linux/issues/660 Signed-off-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Ira Weiny ira.weiny@intel.com Link: https://lore.kernel.org/r/20190918042148.77553-1-natechancellor@gmail.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/nvdimm/test/nfit_test.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/nvdimm/test/nfit_test.h b/tools/testing/nvdimm/test/nfit_test.h index 448d686da8b13..0bf5640f1f071 100644 --- a/tools/testing/nvdimm/test/nfit_test.h +++ b/tools/testing/nvdimm/test/nfit_test.h @@ -4,6 +4,7 @@ */ #ifndef __NFIT_TEST_H__ #define __NFIT_TEST_H__ +#include <linux/acpi.h> #include <linux/list.h> #include <linux/uuid.h> #include <linux/ioport.h> @@ -202,9 +203,6 @@ struct nd_intel_lss { __u32 status; } __packed;
-union acpi_object; -typedef void *acpi_handle; - typedef struct nfit_test_resource *(*nfit_test_lookup_fn)(resource_size_t); typedef union acpi_object *(*nfit_test_evaluate_dsm_fn)(acpi_handle handle, const guid_t *guid, u64 rev, u64 func,
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
[ Upstream commit 2840cf02fae627860156737e83326df354ee4ec6 ]
When the prev and next task's mm change, switch_mm() provides the core serializing guarantees before returning to usermode. The only case where an explicit core serialization is needed is when the scheduler keeps the same mm for prev and next.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sched/mm.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 4a7944078cc35..8557ec6642130 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -362,6 +362,8 @@ enum {
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) { + if (current->mm != mm) + return; if (likely(!(atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))) return;
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
[ Upstream commit fc0d77387cb5ae883fd774fc559e056a8dde024c ]
Fix a logic flaw in the way membarrier_register_private_expedited() handles ready state checks for private expedited sync core and private expedited registrations.
If a private expedited membarrier registration is first performed, and then a private expedited sync_core registration is performed, the ready state check will skip the second registration when it really should not.
Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Chris Metcalf cmetcalf@ezchip.com Cc: Christoph Lameter cl@linux.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kirill Tkhai tkhai@yandex.ru Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Oleg Nesterov oleg@redhat.com Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Russell King - ARM Linux admin linux@armlinux.org.uk Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20190919173705.2181-2-mathieu.desnoyers@efficios.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/membarrier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index aa8d758041088..5110d91b1b0ea 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -226,7 +226,7 @@ static int membarrier_register_private_expedited(int flags) * groups, which use the same mm. (CLONE_VM but not * CLONE_THREAD). */ - if (atomic_read(&mm->membarrier_state) & state) + if ((atomic_read(&mm->membarrier_state) & state) == state) return 0; atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state); if (flags & MEMBARRIER_FLAG_SYNC_CORE)
From: KeMeng Shi shikemeng@huawei.com
[ Upstream commit 714e501e16cd473538b609b3e351b2cc9f7f09ed ]
An oops can be triggered in the scheduler when running qemu on arm64:
Unable to handle kernel paging request at virtual address ffff000008effe40 Internal error: Oops: 96000007 [#1] SMP Process migration/0 (pid: 12, stack limit = 0x00000000084e3736) pstate: 20000085 (nzCv daIf -PAN -UAO) pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20 lr : move_queued_task.isra.21+0x124/0x298 ... Call trace: __ll_sc___cmpxchg_case_acq_4+0x4/0x20 __migrate_task+0xc8/0xe0 migration_cpu_stop+0x170/0x180 cpu_stopper_thread+0xec/0x178 smpboot_thread_fn+0x1ac/0x1e8 kthread+0x134/0x138 ret_from_fork+0x10/0x18
__set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to migrage the process if process is not currently running on any one of the CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if corresponding bit is coincidentally set. As a consequence, kernel will access an invalid rq address associate with the invalid CPU in migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
The reproduce the crash:
1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling sched_setaffinity.
2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online" and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
3) Oops appears if the invalid CPU is set in memory after tested cpumask.
Signed-off-by: KeMeng Shi shikemeng@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Valentin Schneider valentin.schneider@arm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d38f007afea74..fffe790d98bb2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1537,7 +1537,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p, if (cpumask_equal(p->cpus_ptr, new_mask)) goto out;
- if (!cpumask_intersects(new_mask, cpu_valid_mask)) { + dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask); + if (dest_cpu >= nr_cpu_ids) { ret = -EINVAL; goto out; } @@ -1558,7 +1559,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p, if (cpumask_test_cpu(task_cpu(p), new_mask)) goto out;
- dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask); if (task_running(rq, p) || p->state == TASK_WAKING) { struct migration_arg arg = { p, dest_cpu }; /* Need help from migration thread: drop lock and wait. */
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit 815c1560bf8fd522b8d93a1d727868b910c1cc24 ]
With Java 11 there is no seperate JRE anymore.
Details:
https://coderanch.com/t/701603/java/JRE-JDK
Therefore the detection of the JRE needs to be adapted.
This change works for s390 and x86. I have not tested other platforms.
Committer testing:
Continues to work with the OpenJDK 8:
$ rm -f ~acme/lib64/libperf-jvmti.so $ rpm -qa | grep jdk-devel java-1.8.0-openjdk-devel-1.8.0.222.b10-0.fc30.x86_64 $ git log --oneline -1 a51937170f33 (HEAD -> perf/core) perf build: Add detection of java-11-openjdk-devel package $ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -C tools/perf O=/tmp/build/perf install > /dev/null 2>1 $ ls -la ~acme/lib64/libperf-jvmti.so -rwxr-xr-x. 1 acme acme 230744 Sep 24 16:46 /home/acme/lib64/libperf-jvmti.so $
Suggested-by: Andreas Krebbel krebbel@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Heiko Carstens heiko.carstens@de.ibm.com Cc: Hendrik Brueckner brueckner@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Link: http://lore.kernel.org/lkml/20190909114116.50469-4-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/Makefile.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index 89ac5a1f1550e..3da3749118527 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -908,7 +908,7 @@ ifndef NO_JVMTI JDIR=$(shell /usr/sbin/update-java-alternatives -l | head -1 | awk '{print $$3}') else ifneq (,$(wildcard /usr/sbin/alternatives)) - JDIR=$(shell /usr/sbin/alternatives --display java | tail -1 | cut -d' ' -f 5 | sed 's%/jre/bin/java.%%g') + JDIR=$(shell /usr/sbin/alternatives --display java | tail -1 | cut -d' ' -f 5 | sed -e 's%/jre/bin/java.%%g' -e 's%/bin/java.%%g') endif endif ifndef JDIR
From: Qian Cai cai@lca.pw
[ Upstream commit d1a445d3b86c9341ce7a0954c23be0edb5c9bec5 ]
There are many of those warnings.
In file included from ./arch/powerpc/include/asm/paca.h:15, from ./arch/powerpc/include/asm/current.h:13, from ./include/linux/thread_info.h:21, from ./include/asm-generic/preempt.h:5, from ./arch/powerpc/include/generated/asm/preempt.h:1, from ./include/linux/preempt.h:78, from ./include/linux/spinlock.h:51, from fs/fs-writeback.c:19: In function 'strncpy', inlined from 'perf_trace_writeback_page_template' at ./include/trace/events/writeback.h:56:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix it by using the new strscpy_pad() which was introduced in "lib/string: Add strscpy_pad() function" and will always be NUL-terminated instead of strncpy(). Also, change strlcpy() to use strscpy_pad() in this file for consistency.
Link: http://lkml.kernel.org/r/1564075099-27750-1-git-send-email-cai@lca.pw Fixes: 455b2864686d ("writeback: Initial tracing support") Fixes: 028c2dd184c0 ("writeback: Add tracing to balance_dirty_pages") Fixes: e84d0a4f8e39 ("writeback: trace event writeback_queue_io") Fixes: b48c104d2211 ("writeback: trace event bdi_dirty_ratelimit") Fixes: cc1676d917f3 ("writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()") Fixes: 9fb0a7da0c52 ("writeback: add more tracepoints") Signed-off-by: Qian Cai cai@lca.pw Reviewed-by: Jan Kara jack@suse.cz Cc: Tobin C. Harding tobin@kernel.org Cc: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Ingo Molnar mingo@redhat.com Cc: Tejun Heo tj@kernel.org Cc: Dave Chinner dchinner@redhat.com Cc: Fengguang Wu fengguang.wu@intel.com Cc: Jens Axboe axboe@kernel.dk Cc: Joe Perches joe@perches.com Cc: Kees Cook keescook@chromium.org Cc: Jann Horn jannh@google.com Cc: Jonathan Corbet corbet@lwn.net Cc: Nitin Gote nitin.r.gote@intel.com Cc: Rasmus Villemoes rasmus.villemoes@prevas.dk Cc: Stephen Kitt steve@sk2.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/writeback.h | 38 +++++++++++++++++--------------- 1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h index aa7f3aeac7408..79095434c1be3 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -66,8 +66,9 @@ DECLARE_EVENT_CLASS(writeback_page_template, ),
TP_fast_assign( - strncpy(__entry->name, - mapping ? dev_name(inode_to_bdi(mapping->host)->dev) : "(unknown)", 32); + strscpy_pad(__entry->name, + mapping ? dev_name(inode_to_bdi(mapping->host)->dev) : "(unknown)", + 32); __entry->ino = mapping ? mapping->host->i_ino : 0; __entry->index = page->index; ), @@ -110,8 +111,8 @@ DECLARE_EVENT_CLASS(writeback_dirty_inode_template, struct backing_dev_info *bdi = inode_to_bdi(inode);
/* may be called for files on pseudo FSes w/ unregistered bdi */ - strncpy(__entry->name, - bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32); + strscpy_pad(__entry->name, + bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32); __entry->ino = inode->i_ino; __entry->state = inode->i_state; __entry->flags = flags; @@ -190,8 +191,8 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template, ),
TP_fast_assign( - strncpy(__entry->name, - dev_name(inode_to_bdi(inode)->dev), 32); + strscpy_pad(__entry->name, + dev_name(inode_to_bdi(inode)->dev), 32); __entry->ino = inode->i_ino; __entry->sync_mode = wbc->sync_mode; __entry->cgroup_ino = __trace_wbc_assign_cgroup(wbc); @@ -234,8 +235,9 @@ DECLARE_EVENT_CLASS(writeback_work_class, __field(unsigned int, cgroup_ino) ), TP_fast_assign( - strncpy(__entry->name, - wb->bdi->dev ? dev_name(wb->bdi->dev) : "(unknown)", 32); + strscpy_pad(__entry->name, + wb->bdi->dev ? dev_name(wb->bdi->dev) : + "(unknown)", 32); __entry->nr_pages = work->nr_pages; __entry->sb_dev = work->sb ? work->sb->s_dev : 0; __entry->sync_mode = work->sync_mode; @@ -288,7 +290,7 @@ DECLARE_EVENT_CLASS(writeback_class, __field(unsigned int, cgroup_ino) ), TP_fast_assign( - strncpy(__entry->name, dev_name(wb->bdi->dev), 32); + strscpy_pad(__entry->name, dev_name(wb->bdi->dev), 32); __entry->cgroup_ino = __trace_wb_assign_cgroup(wb); ), TP_printk("bdi %s: cgroup_ino=%u", @@ -310,7 +312,7 @@ TRACE_EVENT(writeback_bdi_register, __array(char, name, 32) ), TP_fast_assign( - strncpy(__entry->name, dev_name(bdi->dev), 32); + strscpy_pad(__entry->name, dev_name(bdi->dev), 32); ), TP_printk("bdi %s", __entry->name @@ -335,7 +337,7 @@ DECLARE_EVENT_CLASS(wbc_class, ),
TP_fast_assign( - strncpy(__entry->name, dev_name(bdi->dev), 32); + strscpy_pad(__entry->name, dev_name(bdi->dev), 32); __entry->nr_to_write = wbc->nr_to_write; __entry->pages_skipped = wbc->pages_skipped; __entry->sync_mode = wbc->sync_mode; @@ -386,7 +388,7 @@ TRACE_EVENT(writeback_queue_io, ), TP_fast_assign( unsigned long *older_than_this = work->older_than_this; - strncpy(__entry->name, dev_name(wb->bdi->dev), 32); + strscpy_pad(__entry->name, dev_name(wb->bdi->dev), 32); __entry->older = older_than_this ? *older_than_this : 0; __entry->age = older_than_this ? (jiffies - *older_than_this) * 1000 / HZ : -1; @@ -472,7 +474,7 @@ TRACE_EVENT(bdi_dirty_ratelimit, ),
TP_fast_assign( - strlcpy(__entry->bdi, dev_name(wb->bdi->dev), 32); + strscpy_pad(__entry->bdi, dev_name(wb->bdi->dev), 32); __entry->write_bw = KBps(wb->write_bandwidth); __entry->avg_write_bw = KBps(wb->avg_write_bandwidth); __entry->dirty_rate = KBps(dirty_rate); @@ -537,7 +539,7 @@ TRACE_EVENT(balance_dirty_pages,
TP_fast_assign( unsigned long freerun = (thresh + bg_thresh) / 2; - strlcpy(__entry->bdi, dev_name(wb->bdi->dev), 32); + strscpy_pad(__entry->bdi, dev_name(wb->bdi->dev), 32);
__entry->limit = global_wb_domain.dirty_limit; __entry->setpoint = (global_wb_domain.dirty_limit + @@ -597,8 +599,8 @@ TRACE_EVENT(writeback_sb_inodes_requeue, ),
TP_fast_assign( - strncpy(__entry->name, - dev_name(inode_to_bdi(inode)->dev), 32); + strscpy_pad(__entry->name, + dev_name(inode_to_bdi(inode)->dev), 32); __entry->ino = inode->i_ino; __entry->state = inode->i_state; __entry->dirtied_when = inode->dirtied_when; @@ -671,8 +673,8 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template, ),
TP_fast_assign( - strncpy(__entry->name, - dev_name(inode_to_bdi(inode)->dev), 32); + strscpy_pad(__entry->name, + dev_name(inode_to_bdi(inode)->dev), 32); __entry->ino = inode->i_ino; __entry->state = inode->i_state; __entry->dirtied_when = inode->dirtied_when;
From: Andrii Nakryiko andriin@fb.com
[ Upstream commit 4670d68b9254710fdeaf794cad54d8b2c9929e0a ]
Some recent changes in latest Clang started causing the following warning when unrolling strobemeta test case main loop:
progs/strobemeta.h:416:2: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
This patch simplifies loop's exit condition to depend only on constant max iteration number (STROBE_MAX_MAP_ENTRIES), while moving early termination logic inside the loop body. The changes are equivalent from program logic standpoint, but fixes the warning. It also appears to improve generated BPF code, as it fixes previously failing non-unrolled strobemeta test cases.
Cc: Alexei Starovoitov ast@fb.com Signed-off-by: Andrii Nakryiko andriin@fb.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/bpf/progs/strobemeta.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/strobemeta.h b/tools/testing/selftests/bpf/progs/strobemeta.h index 8a399bdfd9203..067eb625d01c5 100644 --- a/tools/testing/selftests/bpf/progs/strobemeta.h +++ b/tools/testing/selftests/bpf/progs/strobemeta.h @@ -413,7 +413,10 @@ static __always_inline void *read_map_var(struct strobemeta_cfg *cfg, #else #pragma unroll #endif - for (int i = 0; i < STROBE_MAX_MAP_ENTRIES && i < map.cnt; ++i) { + for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { + if (i >= map.cnt) + break; + descr->key_lens[i] = 0; len = bpf_probe_read_str(payload, STROBE_MAX_STR_LEN, map.entries[i].key);
From: Valdis Kletnieks valdis.kletnieks@vt.edu
[ Upstream commit 0f74914071ab7e7b78731ed62bf350e3a344e0a5 ]
When building with W=1, gcc properly complains that there's no prototypes:
CC kernel/elfcore.o kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes] 7 | Elf_Half __weak elf_core_extra_phdrs(void) | ^~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes] 12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes] 17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm) | ^~~~~~~~~~~~~~~~~~~~~~~~~ kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes] 22 | size_t __weak elf_core_extra_data_size(void) | ^~~~~~~~~~~~~~~~~~~~~~~~
Provide the include file so gcc is happy, and we don't have potential code drift
Link: http://lkml.kernel.org/r/29875.1565224705@turing-police Signed-off-by: Valdis Kletnieks valdis.kletnieks@vt.edu Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/elfcore.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/elfcore.c b/kernel/elfcore.c index fc482c8e0bd88..57fb4dcff4349 100644 --- a/kernel/elfcore.c +++ b/kernel/elfcore.c @@ -3,6 +3,7 @@ #include <linux/fs.h> #include <linux/mm.h> #include <linux/binfmts.h> +#include <linux/elfcore.h>
Elf_Half __weak elf_core_extra_phdrs(void) {
From: Andrii Nakryiko andriin@fb.com
[ Upstream commit aef70a1f44c0b570e6345c02c2d240471859f0a4 ]
Some compilers emit warning for potential uninitialized next_id usage. The code is correct, but control flow is too complicated for some compilers to figure this out. Re-initialize next_id to satisfy compiler.
Signed-off-by: Andrii Nakryiko andriin@fb.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/btf_dump.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c index 7065bb5b27525..e1357dbb16c24 100644 --- a/tools/lib/bpf/btf_dump.c +++ b/tools/lib/bpf/btf_dump.c @@ -1213,6 +1213,7 @@ static void btf_dump_emit_type_chain(struct btf_dump *d, return; }
+ next_id = decls->ids[decls->cnt - 1]; next_t = btf__type_by_id(d->btf, next_id); multidim = btf_kind_of(next_t) == BTF_KIND_ARRAY; /* we need space if we have named non-pointer */
From: Ming Lei ming.lei@redhat.com
[ Upstream commit 284b94be1925dbe035ce5218d8b5c197321262c7 ]
Commit c48dac137a62 ("block: don't hold q->sysfs_lock in elevator_init_mq") removes q->sysfs_lock from elevator_init_mq(), but forgot to deal with lockdep_assert_held() called in blk_mq_sched_free_requests() which is run in failure path of elevator_init_mq().
blk_mq_sched_free_requests() is called in the following 3 functions:
elevator_init_mq() elevator_exit() blk_cleanup_queue()
In blk_cleanup_queue(), blk_mq_sched_free_requests() is followed exactly by 'mutex_lock(&q->sysfs_lock)'.
So moving the lockdep_assert_held() from blk_mq_sched_free_requests() into elevator_exit() for fixing the report by syzbot.
Reported-by: syzbot+da3b7677bb913dc1b737@syzkaller.appspotmail.com Fixed: c48dac137a62 ("block: don't hold q->sysfs_lock in elevator_init_mq") Reviewed-by: Bart Van Assche bvanassche@acm.org Reviewed-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq-sched.c | 2 -- block/blk.h | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index c9d183d6c4999..ca22afd47b3dc 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -555,8 +555,6 @@ void blk_mq_sched_free_requests(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i;
- lockdep_assert_held(&q->sysfs_lock); - queue_for_each_hw_ctx(q, hctx, i) { if (hctx->sched_tags) blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i); diff --git a/block/blk.h b/block/blk.h index d5edfd73d45ea..0685c45e3d96e 100644 --- a/block/blk.h +++ b/block/blk.h @@ -201,6 +201,8 @@ void elv_unregister_queue(struct request_queue *q); static inline void elevator_exit(struct request_queue *q, struct elevator_queue *e) { + lockdep_assert_held(&q->sysfs_lock); + blk_mq_sched_free_requests(q); __elevator_exit(q, e); }
From: Allan Zhang allanzhang@google.com
[ Upstream commit 768fb61fcc13b2acaca758275d54c09a65e2968b ]
BPF_PROG_TYPE_SOCK_OPS program can reenter bpf_event_output because it can be called from atomic and non-atomic contexts since we don't have bpf_prog_active to prevent it happen.
This patch enables 3 levels of nesting to support normal, irq and nmi context.
We can easily reproduce the issue by running netperf crr mode with 100 flows and 10 threads from netperf client side.
Here is the whole stack dump:
[ 515.228898] WARNING: CPU: 20 PID: 14686 at kernel/trace/bpf_trace.c:549 bpf_event_output+0x1f9/0x220 [ 515.228903] CPU: 20 PID: 14686 Comm: tcp_crr Tainted: G W 4.15.0-smp-fixpanic #44 [ 515.228904] Hardware name: Intel TBG,ICH10/Ikaria_QC_1b, BIOS 1.22.0 06/04/2018 [ 515.228905] RIP: 0010:bpf_event_output+0x1f9/0x220 [ 515.228906] RSP: 0018:ffff9a57ffc03938 EFLAGS: 00010246 [ 515.228907] RAX: 0000000000000012 RBX: 0000000000000001 RCX: 0000000000000000 [ 515.228907] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff836b0f80 [ 515.228908] RBP: ffff9a57ffc039c8 R08: 0000000000000004 R09: 0000000000000012 [ 515.228908] R10: ffff9a57ffc1de40 R11: 0000000000000000 R12: 0000000000000002 [ 515.228909] R13: ffff9a57e13bae00 R14: 00000000ffffffff R15: ffff9a57ffc1e2c0 [ 515.228910] FS: 00007f5a3e6ec700(0000) GS:ffff9a57ffc00000(0000) knlGS:0000000000000000 [ 515.228910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 515.228911] CR2: 0000537082664fff CR3: 000000061fed6002 CR4: 00000000000226f0 [ 515.228911] Call Trace: [ 515.228913] <IRQ> [ 515.228919] [<ffffffff82c6c6cb>] bpf_sockopt_event_output+0x3b/0x50 [ 515.228923] [<ffffffff8265daee>] ? bpf_ktime_get_ns+0xe/0x10 [ 515.228927] [<ffffffff8266fda5>] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100 [ 515.228930] [<ffffffff82cf90a5>] ? tcp_init_transfer+0x125/0x150 [ 515.228933] [<ffffffff82cf9159>] ? tcp_finish_connect+0x89/0x110 [ 515.228936] [<ffffffff82cf98e4>] ? tcp_rcv_state_process+0x704/0x1010 [ 515.228939] [<ffffffff82c6e263>] ? sk_filter_trim_cap+0x53/0x2a0 [ 515.228942] [<ffffffff82d90d1f>] ? tcp_v6_inbound_md5_hash+0x6f/0x1d0 [ 515.228945] [<ffffffff82d92160>] ? tcp_v6_do_rcv+0x1c0/0x460 [ 515.228947] [<ffffffff82d93558>] ? tcp_v6_rcv+0x9f8/0xb30 [ 515.228951] [<ffffffff82d737c0>] ? ip6_route_input+0x190/0x220 [ 515.228955] [<ffffffff82d5f7ad>] ? ip6_protocol_deliver_rcu+0x6d/0x450 [ 515.228958] [<ffffffff82d60246>] ? ip6_rcv_finish+0xb6/0x170 [ 515.228961] [<ffffffff82d5fb90>] ? ip6_protocol_deliver_rcu+0x450/0x450 [ 515.228963] [<ffffffff82d60361>] ? ipv6_rcv+0x61/0xe0 [ 515.228966] [<ffffffff82d60190>] ? ipv6_list_rcv+0x330/0x330 [ 515.228969] [<ffffffff82c4976b>] ? __netif_receive_skb_one_core+0x5b/0xa0 [ 515.228972] [<ffffffff82c497d1>] ? __netif_receive_skb+0x21/0x70 [ 515.228975] [<ffffffff82c4a8d2>] ? process_backlog+0xb2/0x150 [ 515.228978] [<ffffffff82c4aadf>] ? net_rx_action+0x16f/0x410 [ 515.228982] [<ffffffff830000dd>] ? __do_softirq+0xdd/0x305 [ 515.228986] [<ffffffff8252cfdc>] ? irq_exit+0x9c/0xb0 [ 515.228989] [<ffffffff82e02de5>] ? smp_call_function_single_interrupt+0x65/0x120 [ 515.228991] [<ffffffff82e020e1>] ? call_function_single_interrupt+0x81/0x90 [ 515.228992] </IRQ> [ 515.228996] [<ffffffff82a11ff0>] ? io_serial_in+0x20/0x20 [ 515.229000] [<ffffffff8259c040>] ? console_unlock+0x230/0x490 [ 515.229003] [<ffffffff8259cbaa>] ? vprintk_emit+0x26a/0x2a0 [ 515.229006] [<ffffffff8259cbff>] ? vprintk_default+0x1f/0x30 [ 515.229008] [<ffffffff8259d9f5>] ? vprintk_func+0x35/0x70 [ 515.229011] [<ffffffff8259d4bb>] ? printk+0x50/0x66 [ 515.229013] [<ffffffff82637637>] ? bpf_event_output+0xb7/0x220 [ 515.229016] [<ffffffff82c6c6cb>] ? bpf_sockopt_event_output+0x3b/0x50 [ 515.229019] [<ffffffff8265daee>] ? bpf_ktime_get_ns+0xe/0x10 [ 515.229023] [<ffffffff82c29e87>] ? release_sock+0x97/0xb0 [ 515.229026] [<ffffffff82ce9d6a>] ? tcp_recvmsg+0x31a/0xda0 [ 515.229029] [<ffffffff8266fda5>] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100 [ 515.229032] [<ffffffff82ce77c1>] ? tcp_set_state+0x191/0x1b0 [ 515.229035] [<ffffffff82ced10e>] ? tcp_disconnect+0x2e/0x600 [ 515.229038] [<ffffffff82cecbbb>] ? tcp_close+0x3eb/0x460 [ 515.229040] [<ffffffff82d21082>] ? inet_release+0x42/0x70 [ 515.229043] [<ffffffff82d58809>] ? inet6_release+0x39/0x50 [ 515.229046] [<ffffffff82c1f32d>] ? __sock_release+0x4d/0xd0 [ 515.229049] [<ffffffff82c1f3e5>] ? sock_close+0x15/0x20 [ 515.229052] [<ffffffff8273b517>] ? __fput+0xe7/0x1f0 [ 515.229055] [<ffffffff8273b66e>] ? ____fput+0xe/0x10 [ 515.229058] [<ffffffff82547bf2>] ? task_work_run+0x82/0xb0 [ 515.229061] [<ffffffff824086df>] ? exit_to_usermode_loop+0x7e/0x11f [ 515.229064] [<ffffffff82408171>] ? do_syscall_64+0x111/0x130 [ 515.229067] [<ffffffff82e0007c>] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: a5a3a828cd00 ("bpf: add perf event notificaton support for sock_ops") Signed-off-by: Allan Zhang allanzhang@google.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Stanislav Fomichev sdf@google.com Reviewed-by: Eric Dumazet edumazet@google.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/bpf/20190925234312.94063-2-allanzhang@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/bpf_trace.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ca1255d145766..3e38a010003c9 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -500,14 +500,17 @@ static const struct bpf_func_proto bpf_perf_event_output_proto = { .arg5_type = ARG_CONST_SIZE_OR_ZERO, };
-static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs); -static DEFINE_PER_CPU(struct perf_sample_data, bpf_misc_sd); +static DEFINE_PER_CPU(int, bpf_event_output_nest_level); +struct bpf_nested_pt_regs { + struct pt_regs regs[3]; +}; +static DEFINE_PER_CPU(struct bpf_nested_pt_regs, bpf_pt_regs); +static DEFINE_PER_CPU(struct bpf_trace_sample_data, bpf_misc_sds);
u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy) { - struct perf_sample_data *sd = this_cpu_ptr(&bpf_misc_sd); - struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs); + int nest_level = this_cpu_inc_return(bpf_event_output_nest_level); struct perf_raw_frag frag = { .copy = ctx_copy, .size = ctx_size, @@ -522,12 +525,25 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, .data = meta, }, }; + struct perf_sample_data *sd; + struct pt_regs *regs; + u64 ret; + + if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(bpf_misc_sds.sds))) { + ret = -EBUSY; + goto out; + } + sd = this_cpu_ptr(&bpf_misc_sds.sds[nest_level - 1]); + regs = this_cpu_ptr(&bpf_pt_regs.regs[nest_level - 1]);
perf_fetch_caller_regs(regs); perf_sample_data_init(sd, 0, 0); sd->raw = &raw;
- return __bpf_perf_event_output(regs, map, flags, sd); + ret = __bpf_perf_event_output(regs, map, flags, sd); +out: + this_cpu_dec(bpf_event_output_nest_level); + return ret; }
BPF_CALL_0(bpf_get_current_task)
From: Marek Vasut marex@denx.de
[ Upstream commit a3aa6e65beebf3780026753ebf39db19f4c92990 ]
The regmap stride is set to 1 for regmap describing 8bit registers already. However, for 16/32/64bit registers, the stride is 2/4/8 respectively. This is not correct, as the switch protocol supports unaligned register reads and writes and the KSZ87xx even uses such unaligned register accesses to read e.g. MIB counter.
This patch fixes MIB counter access on KSZ87xx.
Signed-off-by: Marek Vasut marex@denx.de Cc: Andrew Lunn andrew@lunn.ch Cc: David S. Miller davem@davemloft.net Cc: Florian Fainelli f.fainelli@gmail.com Cc: George McCollister george.mccollister@gmail.com Cc: Tristram Ha Tristram.Ha@microchip.com Cc: Vivien Didelot vivien.didelot@savoirfairelinux.com Cc: Woojung Huh woojung.huh@microchip.com Fixes: 46558d601cb6 ("net: dsa: microchip: Initial SPI regmap support") Fixes: 255b59ad0db2 ("net: dsa: microchip: Factor out regmap config generation into common header") Reviewed-by: George McCollister george.mccollister@gmail.com Tested-by: George McCollister george.mccollister@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz_common.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index 72ec250b95408..823f544add0a3 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -130,7 +130,7 @@ static inline void ksz_pwrite32(struct ksz_device *dev, int port, int offset, { \ .name = #width, \ .val_bits = (width), \ - .reg_stride = (width) / 8, \ + .reg_stride = 1, \ .reg_bits = (regbits) + (regalign), \ .pad_bits = (regpad), \ .max_register = BIT(regbits) - 1, \
From: Lee Jones lee.jones@linaro.org
[ Upstream commit 127068abe85bf3dee50df51cb039a5a987a4a666 ]
We have a production-level laptop (Lenovo Yoga C630) which is exhibiting a rather horrific bug. When I2C HID devices are being scanned for at boot-time the QCom Geni based I2C (Serial Engine) attempts to use DMA. When it does, the laptop reboots and the user never sees the OS.
Attempts are being made to debug the reason for the spontaneous reboot. No luck so far, hence the requirement for this hot-fix. This workaround will be removed once we have a viable fix.
Signed-off-by: Lee Jones lee.jones@linaro.org Tested-by: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: Wolfram Sang wsa@the-dreams.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-qcom-geni.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c index a89bfce5388ee..17abf60c94aeb 100644 --- a/drivers/i2c/busses/i2c-qcom-geni.c +++ b/drivers/i2c/busses/i2c-qcom-geni.c @@ -355,11 +355,13 @@ static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg, { dma_addr_t rx_dma; unsigned long time_left; - void *dma_buf; + void *dma_buf = NULL; struct geni_se *se = &gi2c->se; size_t len = msg->len;
- dma_buf = i2c_get_dma_safe_msg_buf(msg, 32); + if (!of_machine_is_compatible("lenovo,yoga-c630")) + dma_buf = i2c_get_dma_safe_msg_buf(msg, 32); + if (dma_buf) geni_se_select_mode(se, GENI_SE_DMA); else @@ -394,11 +396,13 @@ static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg, { dma_addr_t tx_dma; unsigned long time_left; - void *dma_buf; + void *dma_buf = NULL; struct geni_se *se = &gi2c->se; size_t len = msg->len;
- dma_buf = i2c_get_dma_safe_msg_buf(msg, 32); + if (!of_machine_is_compatible("lenovo,yoga-c630")) + dma_buf = i2c_get_dma_safe_msg_buf(msg, 32); + if (dma_buf) geni_se_select_mode(se, GENI_SE_DMA); else
From: Arnaldo Carvalho de Melo acme@redhat.com
[ Upstream commit 26acf400d2dcc72c7e713e1f55db47ad92010cc2 ]
Naresh Kamboju reported, that on the i386 build pr_err() doesn't get defined properly due to header ordering:
perf-in.o: In function `libunwind__x86_reg_id': tools/perf/util/libunwind/../../arch/x86/util/unwind-libunwind.c:109: undefined reference to `pr_err'
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: David Ahern dsahern@gmail.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/arch/x86/util/unwind-libunwind.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/arch/x86/util/unwind-libunwind.c b/tools/perf/arch/x86/util/unwind-libunwind.c index 05920e3edf7a7..47357973b55b2 100644 --- a/tools/perf/arch/x86/util/unwind-libunwind.c +++ b/tools/perf/arch/x86/util/unwind-libunwind.c @@ -1,11 +1,11 @@ // SPDX-License-Identifier: GPL-2.0
#include <errno.h> +#include "../../util/debug.h" #ifndef REMOTE_UNWIND_LIBUNWIND #include <libunwind.h> #include "perf_regs.h" #include "../../util/unwind.h" -#include "../../util/debug.h" #endif
#ifdef HAVE_ARCH_X86_64_SUPPORT
From: Danielle Ratson danieller@mellanox.com
[ Upstream commit 52feb8b588f6d23673dd7cc2b44b203493b627f6 ]
The ASIC can only mirror a packet to one port, but when user is trying to set more than one mirror action, it doesn't fail.
Add a check if more than one mirror action was specified per rule and if so, fail for not being supported.
Fixes: d0d13c1858a11 ("mlxsw: spectrum_acl: Add support for mirror action") Signed-off-by: Danielle Ratson danieller@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Ido Schimmel idosch@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c index 202e9a2460194..7c13656a83384 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c @@ -21,6 +21,7 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp, struct netlink_ext_ack *extack) { const struct flow_action_entry *act; + int mirror_act_count = 0; int err, i;
if (!flow_action_has_entries(flow_action)) @@ -95,6 +96,11 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp, case FLOW_ACTION_MIRRED: { struct net_device *out_dev = act->dev;
+ if (mirror_act_count++) { + NL_SET_ERR_MSG_MOD(extack, "Multiple mirror actions per rule are not supported"); + return -EOPNOTSUPP; + } + err = mlxsw_sp_acl_rulei_act_mirror(mlxsw_sp, rulei, block, out_dev, extack);
From: Navid Emamdoost navid.emamdoost@gmail.com
[ Upstream commit 78beef629fd95be4ed853b2d37b832f766bd96ca ]
In nfp_abm_u32_knode_replace if the allocation for match fails it should go to the error handling instead of returning. Updated other gotos to have correct errno returned, too.
Signed-off-by: Navid Emamdoost navid.emamdoost@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/netronome/nfp/abm/cls.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/abm/cls.c b/drivers/net/ethernet/netronome/nfp/abm/cls.c index 23ebddfb95325..9f8a1f69c0c4c 100644 --- a/drivers/net/ethernet/netronome/nfp/abm/cls.c +++ b/drivers/net/ethernet/netronome/nfp/abm/cls.c @@ -176,8 +176,10 @@ nfp_abm_u32_knode_replace(struct nfp_abm_link *alink, u8 mask, val; int err;
- if (!nfp_abm_u32_check_knode(alink->abm, knode, proto, extack)) + if (!nfp_abm_u32_check_knode(alink->abm, knode, proto, extack)) { + err = -EOPNOTSUPP; goto err_delete; + }
tos_off = proto == htons(ETH_P_IP) ? 16 : 20;
@@ -198,14 +200,18 @@ nfp_abm_u32_knode_replace(struct nfp_abm_link *alink, if ((iter->val & cmask) == (val & cmask) && iter->band != knode->res->classid) { NL_SET_ERR_MSG_MOD(extack, "conflict with already offloaded filter"); + err = -EOPNOTSUPP; goto err_delete; } }
if (!match) { match = kzalloc(sizeof(*match), GFP_KERNEL); - if (!match) - return -ENOMEM; + if (!match) { + err = -ENOMEM; + goto err_delete; + } + list_add(&match->list, &alink->dscp_map); } match->handle = knode->handle; @@ -221,7 +227,7 @@ nfp_abm_u32_knode_replace(struct nfp_abm_link *alink,
err_delete: nfp_abm_u32_knode_delete(alink, knode); - return -EOPNOTSUPP; + return err; }
static int nfp_abm_setup_tc_block_cb(enum tc_setup_type type,
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 9dbc88d013b79c62bd845cb9e7c0256e660967c5 ]
Bail from the pci_driver probe function instead of from the drm_driver load function.
This avoid /dev/dri/card0 temporarily getting registered and then unregistered again, sending unwanted add / remove udev events to userspace.
Specifically this avoids triggering the (userspace) bug fixed by this plymouth merge-request: https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59
Note that despite that being an userspace bug, not sending unnecessary udev events is a good idea in general.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490 Reviewed-by: Michel Dänzer mdaenzer@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/radeon/radeon_drv.c | 31 +++++++++++++++++++++++++++++ drivers/gpu/drm/radeon/radeon_kms.c | 25 ----------------------- 2 files changed, 31 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index 15d7bebe17294..5cc0fbb04ab14 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -325,8 +325,39 @@ bool radeon_device_is_virtual(void); static int radeon_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { + unsigned long flags = 0; int ret;
+ if (!ent) + return -ENODEV; /* Avoid NULL-ptr deref in drm_get_pci_dev */ + + flags = ent->driver_data; + + if (!radeon_si_support) { + switch (flags & RADEON_FAMILY_MASK) { + case CHIP_TAHITI: + case CHIP_PITCAIRN: + case CHIP_VERDE: + case CHIP_OLAND: + case CHIP_HAINAN: + dev_info(&pdev->dev, + "SI support disabled by module param\n"); + return -ENODEV; + } + } + if (!radeon_cik_support) { + switch (flags & RADEON_FAMILY_MASK) { + case CHIP_KAVERI: + case CHIP_BONAIRE: + case CHIP_HAWAII: + case CHIP_KABINI: + case CHIP_MULLINS: + dev_info(&pdev->dev, + "CIK support disabled by module param\n"); + return -ENODEV; + } + } + if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER;
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c index 07f7ace42c4ba..e85c554eeaa94 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -100,31 +100,6 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags) struct radeon_device *rdev; int r, acpi_status;
- if (!radeon_si_support) { - switch (flags & RADEON_FAMILY_MASK) { - case CHIP_TAHITI: - case CHIP_PITCAIRN: - case CHIP_VERDE: - case CHIP_OLAND: - case CHIP_HAINAN: - dev_info(dev->dev, - "SI support disabled by module param\n"); - return -ENODEV; - } - } - if (!radeon_cik_support) { - switch (flags & RADEON_FAMILY_MASK) { - case CHIP_KAVERI: - case CHIP_BONAIRE: - case CHIP_HAWAII: - case CHIP_KABINI: - case CHIP_MULLINS: - dev_info(dev->dev, - "CIK support disabled by module param\n"); - return -ENODEV; - } - } - rdev = kzalloc(sizeof(struct radeon_device), GFP_KERNEL); if (rdev == NULL) { return -ENOMEM;
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 9f7fec0ba89108b9385f1b9fb167861224912a4a ]
Some of the self tests create a test inode, setup some extents and then do calls to btrfs_get_extent() to test that the corresponding extent maps exist and are correct. However btrfs_get_extent(), since the 5.2 merge window, now errors out when it finds a regular or prealloc extent for an inode that does not correspond to a regular file (its ->i_mode is not S_IFREG). This causes the self tests to fail sometimes, specially when KASAN, slub_debug and page poisoning are enabled:
$ modprobe btrfs modprobe: ERROR: could not insert 'btrfs': Invalid argument
$ dmesg [ 9414.691648] Btrfs loaded, crc32c=crc32c-intel, debug=on, assert=on, integrity-checker=on, ref-verify=on [ 9414.692655] BTRFS: selftest: sectorsize: 4096 nodesize: 4096 [ 9414.692658] BTRFS: selftest: running btrfs free space cache tests [ 9414.692918] BTRFS: selftest: running extent only tests [ 9414.693061] BTRFS: selftest: running bitmap only tests [ 9414.693366] BTRFS: selftest: running bitmap and extent tests [ 9414.696455] BTRFS: selftest: running space stealing from bitmap to extent tests [ 9414.697131] BTRFS: selftest: running extent buffer operation tests [ 9414.697133] BTRFS: selftest: running btrfs_split_item tests [ 9414.697564] BTRFS: selftest: running extent I/O tests [ 9414.697583] BTRFS: selftest: running find delalloc tests [ 9415.081125] BTRFS: selftest: running find_first_clear_extent_bit test [ 9415.081278] BTRFS: selftest: running extent buffer bitmap tests [ 9415.124192] BTRFS: selftest: running inode tests [ 9415.124195] BTRFS: selftest: running btrfs_get_extent tests [ 9415.127909] BTRFS: selftest: running hole first btrfs_get_extent test [ 9415.128343] BTRFS critical (device (efault)): regular/prealloc extent found for non-regular inode 256 [ 9415.131428] BTRFS: selftest: fs/btrfs/tests/inode-tests.c:904 expected a real extent, got 0
This happens because the test inodes are created without ever initializing the i_mode field of the inode, and neither VFS's new_inode() nor the btrfs callback btrfs_alloc_inode() initialize the i_mode. Initialization of the i_mode is done through the various callbacks used by the VFS to create new inodes (regular files, directories, symlinks, tmpfiles, etc), which all call btrfs_new_inode() which in turn calls inode_init_owner(), which sets the inode's i_mode. Since the tests only uses new_inode() to create the test inodes, the i_mode was never initialized.
This always happens on a VM I used with kasan, slub_debug and many other debug facilities enabled. It also happened to someone who reported this on bugzilla (on a 5.3-rc).
Fix this by setting i_mode to S_IFREG at btrfs_new_test_inode().
Fixes: 6bf9e4bd6a2778 ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204397 Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tests/btrfs-tests.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index 1e3ba49493995..814a918998ece 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -51,7 +51,13 @@ static struct file_system_type test_type = {
struct inode *btrfs_new_test_inode(void) { - return new_inode(test_mnt->mnt_sb); + struct inode *inode; + + inode = new_inode(test_mnt->mnt_sb); + if (inode) + inode_init_owner(inode, NULL, S_IFREG); + + return inode; }
static int btrfs_init_test_fs(void)
From: Sean Christopherson sean.j.christopherson@intel.com
[ Upstream commit 567926cca99ba1750be8aae9c4178796bf9bb90b ]
Current versions of Intel's SDM incorrectly state that "bits 31:15 of the VM-Entry exception error-code field" must be zero. In reality, bits 31:16 must be zero, i.e. error codes are 16-bit values.
The bogus error code check manifests as an unexpected VM-Entry failure due to an invalid code field (error number 7) in L1, e.g. when injecting a #GP with error_code=0x9f00.
Nadav previously reported the bug[*], both to KVM and Intel, and fixed the associated kvm-unit-test.
[*] https://patchwork.kernel.org/patch/11124749/
Reported-by: Nadav Amit namit@vmware.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Reviewed-by: Jim Mattson jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/vmx/nested.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index a3cba321b5c5d..61aa9421e27af 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2584,7 +2584,7 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
/* VM-entry exception error code */ if (has_error_code && - vmcs12->vm_entry_exception_error_code & GENMASK(31, 15)) + vmcs12->vm_entry_exception_error_code & GENMASK(31, 16)) return -EINVAL;
/* VM-entry interruption-info field: reserved bits */
From: Balasubramani Vivekanandan balasubramani_vivekanandan@mentor.com
[ Upstream commit b9023b91dd020ad7e093baa5122b6968c48cc9e0 ]
When a cpu requests broadcasting, before starting the tick broadcast hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide the required synchronization when the callback is active on other core.
The callback could have already executed tick_handle_oneshot_broadcast() and could have also returned. But still there is a small time window where the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns without doing anything, but the next_event of the tick broadcast clock device is already set to a timeout value.
In the race condition diagram below, CPU #1 is running the timer callback and CPU #2 is entering idle state and so calls bc_set_next().
In the worst case, the next_event will contain an expiry time, but the hrtimer will not be started which happens when the racing callback returns HRTIMER_NORESTART. The hrtimer might never recover if all further requests from the CPUs to subscribe to tick broadcast have timeout greater than the next_event of tick broadcast clock device. This leads to cascading of failures and finally noticed as rcu stall warnings
Here is a depiction of the race condition
CPU #1 (Running timer callback) CPU #2 (Enter idle and subscribe to tick broadcast) --------------------- ---------------------
__run_hrtimer() tick_broadcast_enter()
bc_handler() __tick_broadcast_oneshot_control()
tick_handle_oneshot_broadcast()
raw_spin_lock(&tick_broadcast_lock);
dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock //next_event for tick broadcast clock set to KTIME_MAX since no other cores subscribed to tick broadcasting
raw_spin_unlock(&tick_broadcast_lock);
if (dev->next_event == KTIME_MAX) return HRTIMER_NORESTART // callback function exits without restarting the hrtimer //tick_broadcast_lock acquired raw_spin_lock(&tick_broadcast_lock);
tick_broadcast_set_event()
clockevents_program_event()
dev->next_event = expires;
bc_set_next()
hrtimer_try_to_cancel() //returns -1 since the timer callback is active. Exits without restarting the timer cpu_base->running = NULL;
The comment that hrtimer cannot be armed from within the callback is wrong. It is fine to start the hrtimer from within the callback. Also it is safe to start the hrtimer from the enter/exit idle code while the broadcast handler is active. The enter/exit idle code and the broadcast handler are synchronized using tick_broadcast_lock. So there is no need for the existing try to cancel logic. All this can be removed which will eliminate the race condition as well.
Fixes: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") Originally-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Balasubramani Vivekanandan balasubramani_vivekanandan@mentor.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/time/tick-broadcast-hrtimer.c | 57 ++++++++++++++-------------- 1 file changed, 29 insertions(+), 28 deletions(-)
diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c index 5be6154e2fd2c..99fbfb8d9117c 100644 --- a/kernel/time/tick-broadcast-hrtimer.c +++ b/kernel/time/tick-broadcast-hrtimer.c @@ -42,34 +42,39 @@ static int bc_shutdown(struct clock_event_device *evt) */ static int bc_set_next(ktime_t expires, struct clock_event_device *bc) { - int bc_moved; /* - * We try to cancel the timer first. If the callback is on - * flight on some other cpu then we let it handle it. If we - * were able to cancel the timer nothing can rearm it as we - * own broadcast_lock. + * This is called either from enter/exit idle code or from the + * broadcast handler. In all cases tick_broadcast_lock is held. * - * However we can also be called from the event handler of - * ce_broadcast_hrtimer itself when it expires. We cannot - * restart the timer because we are in the callback, but we - * can set the expiry time and let the callback return - * HRTIMER_RESTART. + * hrtimer_cancel() cannot be called here neither from the + * broadcast handler nor from the enter/exit idle code. The idle + * code can run into the problem described in bc_shutdown() and the + * broadcast handler cannot wait for itself to complete for obvious + * reasons. * - * Since we are in the idle loop at this point and because - * hrtimer_{start/cancel} functions call into tracing, - * calls to these functions must be bound within RCU_NONIDLE. + * Each caller tries to arm the hrtimer on its own CPU, but if the + * hrtimer callbback function is currently running, then + * hrtimer_start() cannot move it and the timer stays on the CPU on + * which it is assigned at the moment. + * + * As this can be called from idle code, the hrtimer_start() + * invocation has to be wrapped with RCU_NONIDLE() as + * hrtimer_start() can call into tracing. */ - RCU_NONIDLE({ - bc_moved = hrtimer_try_to_cancel(&bctimer) >= 0; - if (bc_moved) - hrtimer_start(&bctimer, expires, - HRTIMER_MODE_ABS_PINNED);}); - if (bc_moved) { - /* Bind the "device" to the cpu */ - bc->bound_on = smp_processor_id(); - } else if (bc->bound_on == smp_processor_id()) { - hrtimer_set_expires(&bctimer, expires); - } + RCU_NONIDLE( { + hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED); + /* + * The core tick broadcast mode expects bc->bound_on to be set + * correctly to prevent a CPU which has the broadcast hrtimer + * armed from going deep idle. + * + * As tick_broadcast_lock is held, nothing can change the cpu + * base which was just established in hrtimer_start() above. So + * the below access is safe even without holding the hrtimer + * base lock. + */ + bc->bound_on = bctimer.base->cpu_base->cpu; + } ); return 0; }
@@ -95,10 +100,6 @@ static enum hrtimer_restart bc_handler(struct hrtimer *t) { ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);
- if (clockevent_state_oneshot(&ce_broadcast_hrtimer)) - if (ce_broadcast_hrtimer.next_event != KTIME_MAX) - return HRTIMER_RESTART; - return HRTIMER_NORESTART; }
From: Srikar Dronamraju srikar@linux.vnet.ibm.com
[ Upstream commit b63fd11cced17fcb8e133def29001b0f6aaa5e06 ]
When using 'perf stat' with repeat and interval option, it shows wrong values for events.
The wrong values will be shown for the first interval on the second and subsequent repetitions.
Without the fix:
# perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
2.000282489 53 faults 2.000282489 513 sched:sched_switch 4.005478208 3,721 faults 4.005478208 2,666 sched:sched_switch 5.025470933 395 faults 5.025470933 1,307 sched:sched_switch 2.009602825 1,84,46,74,40,73,70,95,47,520 faults <------ 2.009602825 1,84,46,74,40,73,70,95,49,568 sched:sched_switch <------ 4.019612206 4,730 faults 4.019612206 2,746 sched:sched_switch 5.039615484 3,953 faults 5.039615484 1,496 sched:sched_switch 2.000274620 1,84,46,74,40,73,70,95,47,520 faults <------ 2.000274620 1,84,46,74,40,73,70,95,47,520 sched:sched_switch <------ 4.000480342 4,282 faults 4.000480342 2,303 sched:sched_switch 5.000916811 1,322 faults 5.000916811 1,064 sched:sched_switch #
prev_raw_counts is allocated when using intervals. This is used when calculating the difference in the counts of events when using interval.
The current counts are stored in prev_raw_counts to calculate the differences in the next iteration.
On the first interval of the second and subsequent repetitions, prev_raw_counts would be the values stored in the last interval of the previous repetitions, while the current counts will only be for the first interval of the current repetition.
Hence there is a possibility of events showing up as big number.
Fix this by resetting prev_raw_counts whenever perf stat repeats the command.
With the fix:
# perf stat -r 3 -I 2000 -e faults -e sched:sched_switch -a sleep 5
2.019349347 2,597 faults 2.019349347 2,753 sched:sched_switch 4.019577372 3,098 faults 4.019577372 2,532 sched:sched_switch 5.019415481 1,879 faults 5.019415481 1,356 sched:sched_switch 2.000178813 8,468 faults 2.000178813 2,254 sched:sched_switch 4.000404621 7,440 faults 4.000404621 1,266 sched:sched_switch 5.040196079 2,458 faults 5.040196079 556 sched:sched_switch 2.000191939 6,870 faults 2.000191939 1,170 sched:sched_switch 4.000414103 541 faults 4.000414103 902 sched:sched_switch 5.000809863 450 faults 5.000809863 364 sched:sched_switch #
Committer notes:
This was broken since the cset introducing the --interval feature, i.e. --repeat + --interval wasn't tested at that point, add the Fixes tag so that automatic scripts can pick this up.
Fixes: 13370a9b5bb8 ("perf stat: Add interval printing") Signed-off-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Acked-by: Jiri Olsa jolsa@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Tested-by: Ravi Bangoria ravi.bangoria@linux.ibm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Cc: Stephane Eranian eranian@google.com Cc: stable@vger.kernel.org # v3.9+ Link: http://lore.kernel.org/lkml/20190904094738.9558-2-srikar@linux.vnet.ibm.com [ Fixed up conflicts with libperf, i.e. some perf_{evsel,evlist} lost the 'perf' prefix ] Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-stat.c | 3 +++ tools/perf/util/stat.c | 17 +++++++++++++++++ tools/perf/util/stat.h | 1 + 3 files changed, 21 insertions(+)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 3e13b231f2f56..8ec06bf3372c6 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1961,6 +1961,9 @@ int cmd_stat(int argc, const char **argv) fprintf(output, "[ perf stat: executing run #%d ... ]\n", run_idx + 1);
+ if (run_idx != 0) + perf_evlist__reset_prev_raw_counts(evsel_list); + status = run_perf_stat(argc, argv, run_idx); if (forever && status != -1 && !interval) { print_counters(NULL, argc, argv); diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index db8a6cf336bed..6ce66c2727474 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -155,6 +155,15 @@ static void perf_evsel__free_prev_raw_counts(struct perf_evsel *evsel) evsel->prev_raw_counts = NULL; }
+static void perf_evsel__reset_prev_raw_counts(struct perf_evsel *evsel) +{ + if (evsel->prev_raw_counts) { + evsel->prev_raw_counts->aggr.val = 0; + evsel->prev_raw_counts->aggr.ena = 0; + evsel->prev_raw_counts->aggr.run = 0; + } +} + static int perf_evsel__alloc_stats(struct perf_evsel *evsel, bool alloc_raw) { int ncpus = perf_evsel__nr_cpus(evsel); @@ -205,6 +214,14 @@ void perf_evlist__reset_stats(struct perf_evlist *evlist) } }
+void perf_evlist__reset_prev_raw_counts(struct perf_evlist *evlist) +{ + struct perf_evsel *evsel; + + evlist__for_each_entry(evlist, evsel) + perf_evsel__reset_prev_raw_counts(evsel); +} + static void zero_per_pkg(struct perf_evsel *counter) { if (counter->per_pkg_mask) diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index 7032dd1eeac2f..9cd0d9cff374a 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -194,6 +194,7 @@ void perf_stat__collect_metric_expr(struct perf_evlist *); int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw); void perf_evlist__free_stats(struct perf_evlist *evlist); void perf_evlist__reset_stats(struct perf_evlist *evlist); +void perf_evlist__reset_prev_raw_counts(struct perf_evlist *evlist);
int perf_stat_process_counter(struct perf_stat_config *config, struct perf_evsel *counter);
From: Vincent Chen vincent.chen@sifive.com
[ Upstream commit c82dd6d078a2bb29d41eda032bb96d05699a524d ]
When the handle_exception function addresses an exception, the interrupts will be unconditionally enabled after finishing the context save. However, It may erroneously enable the interrupts if the interrupts are disabled before entering the handle_exception.
For example, one of the WARN_ON() condition is satisfied in the scheduling where the interrupt is disabled and rq.lock is locked. The WARN_ON will trigger a break exception and the handle_exception function will enable the interrupts before entering do_trap_break function. During the procedure, if a timer interrupt is pending, it will be taken when interrupts are enabled. In this case, it may cause a deadlock problem if the rq.lock is locked again in the timer ISR.
Hence, the handle_exception() can only enable interrupts when the state of sstatus.SPIE is 1.
This patch is tested on HiFive Unleashed board.
Signed-off-by: Vincent Chen vincent.chen@sifive.com Reviewed-by: Palmer Dabbelt palmer@sifive.com [paul.walmsley@sifive.com: updated to apply] Fixes: bcae803a21317 ("RISC-V: Enable IRQ during exception handling") Cc: David Abdurachmanov david.abdurachmanov@sifive.com Cc: stable@vger.kernel.org Signed-off-by: Paul Walmsley paul.walmsley@sifive.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/kernel/entry.S | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index bc7a56e1ca6f4..9b60878a4469c 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -166,9 +166,13 @@ ENTRY(handle_exception) move a0, sp /* pt_regs */ tail do_IRQ 1: - /* Exceptions run with interrupts enabled */ + /* Exceptions run with interrupts enabled or disabled + depending on the state of sstatus.SR_SPIE */ + andi t0, s1, SR_SPIE + beqz t0, 1f csrs sstatus, SR_SIE
+1: /* Handle syscalls */ li t0, EXC_SYSCALL beq s4, t0, handle_syscall
From: Eric Sandeen sandeen@redhat.com
commit cc3a7bfe62b947b423fcb2cfe89fcba92bf48fa3 upstream.
Today, put_compat_statfs64() disallows nearly any field value over 2^32 if f_bsize is only 32 bits, but that makes no sense. compat_statfs64 is there for the explicit purpose of providing 64-bit fields for f_files, f_ffree, etc. And f_bsize is always only 32 bits.
As a result, 32-bit userspace gets -EOVERFLOW for i.e. large file counts even with -D_FILE_OFFSET_BITS=64 set.
In reality, only f_bsize and f_frsize can legitimately overflow (fields like f_type and f_namelen should never be large), so test only those fields.
This bug was discussed at length some time ago, and this is the proposal Al suggested at https://lkml.org/lkml/2018/8/6/640. It seemed to get dropped amid the discussion of other related changes, but this part seems obviously correct on its own, so I've picked it up and sent it, for expediency.
Fixes: 64d2ab32efe3 ("vfs: fix put_compat_statfs64() does not handle errors") Signed-off-by: Eric Sandeen sandeen@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/statfs.c | 17 ++++------------- 1 file changed, 4 insertions(+), 13 deletions(-)
--- a/fs/statfs.c +++ b/fs/statfs.c @@ -318,19 +318,10 @@ COMPAT_SYSCALL_DEFINE2(fstatfs, unsigned static int put_compat_statfs64(struct compat_statfs64 __user *ubuf, struct kstatfs *kbuf) { struct compat_statfs64 buf; - if (sizeof(ubuf->f_bsize) == 4) { - if ((kbuf->f_type | kbuf->f_bsize | kbuf->f_namelen | - kbuf->f_frsize | kbuf->f_flags) & 0xffffffff00000000ULL) - return -EOVERFLOW; - /* f_files and f_ffree may be -1; it's okay - * to stuff that into 32 bits */ - if (kbuf->f_files != 0xffffffffffffffffULL - && (kbuf->f_files & 0xffffffff00000000ULL)) - return -EOVERFLOW; - if (kbuf->f_ffree != 0xffffffffffffffffULL - && (kbuf->f_ffree & 0xffffffff00000000ULL)) - return -EOVERFLOW; - } + + if ((kbuf->f_bsize | kbuf->f_frsize) & 0xffffffff00000000ULL) + return -EOVERFLOW; + memset(&buf, 0, sizeof(struct compat_statfs64)); buf.f_type = kbuf->f_type; buf.f_bsize = kbuf->f_bsize;
From: Andrew Murray andrew.murray@arm.com
commit 1004ce4c255fc3eb3ad9145ddd53547d1b7ce327 upstream.
Synchronization is recommended before disabling the trace registers to prevent any start or stop points being speculative at the point of disabling the unit (section 7.3.77 of ARM IHI 0064D).
Synchronization is also recommended after programming the trace registers to ensure all updates are committed prior to normal code resuming (section 4.3.7 of ARM IHI 0064D).
Let's ensure these syncronization points are present in the code and clearly commented.
Note that we could rely on the barriers in CS_LOCK and coresight_disclaim_device_unlocked or the context switch to user space - however coresight may be of use in the kernel.
On armv8 the mb macro is defined as dsb(sy) - Given that the etm4x is only used on armv8 let's directly use dsb(sy) instead of mb(). This removes some ambiguity and makes it easier to correlate the code with the TRM.
Signed-off-by: Andrew Murray andrew.murray@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com [Fixed capital letter for "use" in title] Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Link: https://lore.kernel.org/r/20190829202842.580-11-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwtracing/coresight/coresight-etm4x.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
--- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -188,6 +188,13 @@ static int etm4_enable_hw(struct etmv4_d dev_err(etm_dev, "timeout while waiting for Idle Trace Status\n");
+ /* + * As recommended by section 4.3.7 ("Synchronization when using the + * memory-mapped interface") of ARM IHI 0064D + */ + dsb(sy); + isb(); + done: CS_LOCK(drvdata->base);
@@ -453,8 +460,12 @@ static void etm4_disable_hw(void *info) /* EN, bit[0] Trace unit enable bit */ control &= ~0x1;
- /* make sure everything completes before disabling */ - mb(); + /* + * Make sure everything completes before disabling, as recommended + * by section 7.3.77 ("TRCVICTLR, ViewInst Main Control Register, + * SSTATUS") of ARM IHI 0064D + */ + dsb(sy); isb(); writel_relaxed(control, drvdata->base + TRCPRGCTLR);
From: Gao Xiang gaoxiang25@huawei.com
commit acb383f1dcb4f1e79b66d4be3a0b6f519a957b0d upstream.
Richard observed a forever loop of erofs_read_raw_page() [1] which can be generated by forcely setting ->u.i_blkaddr to 0xdeadbeef (as my understanding block layer can handle access beyond end of device correctly).
After digging into that, it seems the problem is highly related with directories and then I found the root cause is an improper error handling in erofs_readdir().
Let's fix it now.
[1] https://lore.kernel.org/r/1163995781.68824.1566084358245.JavaMail.zimbra@nod...
Reported-by: Richard Weinberger richard@nod.at Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations") Cc: stable@vger.kernel.org # 4.19+ Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Link: https://lore.kernel.org/r/20190818125457.25906-1-hsiangkao@aol.com [ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED, let's use original error code instead. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/erofs/dir.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/drivers/staging/erofs/dir.c +++ b/drivers/staging/erofs/dir.c @@ -99,8 +99,15 @@ static int erofs_readdir(struct file *f, unsigned int nameoff, maxsize;
dentry_page = read_mapping_page(mapping, i, NULL); - if (IS_ERR(dentry_page)) - continue; + if (dentry_page == ERR_PTR(-ENOMEM)) { + err = -ENOMEM; + break; + } else if (IS_ERR(dentry_page)) { + errln("fail to readdir of logical block %u of nid %llu", + i, EROFS_V(dir)->nid); + err = PTR_ERR(dentry_page); + break; + }
de = (struct erofs_dirent *)kmap(dentry_page);
From: Gao Xiang gaoxiang25@huawei.com
commit ee45197c807895e156b2be0abcaebdfc116487c8 upstream.
As reported by erofs_utils fuzzer, a logical page can belong to at most 2 compressed clusters, if one compressed cluster is corrupted, but the other has been ready in submitting chain.
The chain needs to submit anyway in order to keep the page working properly (page unlocked with PG_error set, PG_uptodate not set).
Let's fix it now.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Link: https://lore.kernel.org/r/20190819103426.87579-2-gaoxiang25@huawei.com [ Gao Xiang: Manually backport to v5.3.y stable. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/erofs/unzip_vle.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -1498,19 +1498,18 @@ static int z_erofs_vle_normalaccess_read err = z_erofs_do_read_page(&f, page, &pagepool); (void)z_erofs_vle_work_iter_end(&f.builder);
- if (err) { + /* if some compressed cluster ready, need submit them anyway */ + z_erofs_submit_and_unzip(&f, &pagepool, true); + + if (err) errln("%s, failed to read, err [%d]", __func__, err); - goto out; - }
- z_erofs_submit_and_unzip(&f, &pagepool, true); -out: if (f.map.mpage) put_page(f.map.mpage);
/* clean up the remaining free pages */ put_pages_list(&pagepool); - return 0; + return err; }
static int z_erofs_vle_normalaccess_readpages(struct file *filp,
From: Gao Xiang gaoxiang25@huawei.com
commit 138e1a0990e80db486ab9f6c06bd5c01f9a97999 upstream.
As reported by erofs-utils fuzzer, these error handling path will be entered to handle corrupted images.
Lack of erofs_workgroup_puts will cause unmounting unsuccessfully.
Fix these return values to EFSCORRUPTED as well.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Link: https://lore.kernel.org/r/20190819103426.87579-4-gaoxiang25@huawei.com [ Gao Xiang: Older kernel versions don't have length validity check and EFSCORRUPTED, thus backport pageofs check for now. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/erofs/unzip_vle.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -393,7 +393,11 @@ z_erofs_vle_work_lookup(const struct z_e /* if multiref is disabled, `primary' is always true */ primary = true;
- DBG_BUGON(work->pageofs != f->pageofs); + if (work->pageofs != f->pageofs) { + DBG_BUGON(1); + erofs_workgroup_put(egrp); + return ERR_PTR(-EIO); + }
/* * lock must be taken first to avoid grp->next == NIL between
From: Gao Xiang gaoxiang25@huawei.com
commit 598bb8913d015150b7734b55443c0e53e7189fc7 upstream.
As reported by erofs-utils fuzzer, Lookback distance should be a positive number, so it should be actually looked back rather than spinning.
Fixes: 02827e1796b3 ("staging: erofs: add erofs_map_blocks_iter") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Link: https://lore.kernel.org/r/20190819103426.87579-7-gaoxiang25@huawei.com [ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED, let's use EIO instead. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/erofs/zmap.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/staging/erofs/zmap.c +++ b/drivers/staging/erofs/zmap.c @@ -350,6 +350,12 @@ static int vle_extent_lookback(struct z_
switch (m->type) { case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD: + if (!m->delta[0]) { + errln("invalid lookback distance 0 at nid %llu", + vi->nid); + DBG_BUGON(1); + return -EIO; + } return vle_extent_lookback(m, m->delta[0]); case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN: map->m_flags &= ~EROFS_MAP_ZIPPED;
From: Gao Xiang gaoxiang25@huawei.com
commit e12a0ce2fa69798194f3a8628baf6edfbd5c548f upstream.
As reported by erofs-utils fuzzer, currently, multiref (ondisk deduplication) hasn't been supported for now, we should forbid it properly.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Link: https://lore.kernel.org/r/20190821140152.229648-1-gaoxiang25@huawei.com [ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED, let's use EIO instead. ] Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/erofs/unzip_vle.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-)
--- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -943,6 +943,7 @@ repeat: for (i = 0; i < nr_pages; ++i) pages[i] = NULL;
+ err = 0; z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS, work->pagevec, 0);
@@ -964,8 +965,17 @@ repeat: pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages); - DBG_BUGON(pages[pagenr]);
+ /* + * currently EROFS doesn't support multiref(dedup), + * so here erroring out one multiref page. + */ + if (pages[pagenr]) { + DBG_BUGON(1); + SetPageError(pages[pagenr]); + z_erofs_onlinepage_endio(pages[pagenr]); + err = -EIO; + } pages[pagenr] = page; } sparsemem_pages = i; @@ -975,7 +985,6 @@ repeat: overlapped = false; compressed_pages = grp->compressed_pages;
- err = 0; for (i = 0; i < clusterpages; ++i) { unsigned int pagenr;
@@ -999,7 +1008,12 @@ repeat: pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages); - DBG_BUGON(pages[pagenr]); + if (pages[pagenr]) { + DBG_BUGON(1); + SetPageError(pages[pagenr]); + z_erofs_onlinepage_endio(pages[pagenr]); + err = -EIO; + } ++sparsemem_pages; pages[pagenr] = page;
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit 674f31a352da5e9f621f757b9a89262f486533a0 ]
Current implementation attempts to request keys from the keyring even when security is not enabled. Change behavior so when security is disabled it will skip key request.
Error messages seen when no keys are installed and libnvdimm is loaded:
request-key[4598]: Cannot find command to construct key 661489677 request-key[4606]: Cannot find command to construct key 34713726
Cc: stable@vger.kernel.org Fixes: 4c6926a23b76 ("acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMs") Signed-off-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/156934642272.30222.5230162488753445916.stgit@djian... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvdimm/security.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c index a570f2263a424..5b7ea93edb935 100644 --- a/drivers/nvdimm/security.c +++ b/drivers/nvdimm/security.c @@ -177,6 +177,10 @@ static int __nvdimm_security_unlock(struct nvdimm *nvdimm) || nvdimm->sec.state < 0) return -EIO;
+ /* No need to go further if security is disabled */ + if (nvdimm->sec.state == NVDIMM_SECURITY_DISABLED) + return 0; + if (test_bit(NDD_SECURITY_OVERWRITE, &nvdimm->flags)) { dev_dbg(dev, "Security operation in progress.\n"); return -EBUSY;
stable-rc/linux-5.3.y boot: 130 boots: 4 failed, 116 passed with 10 offline (v5.3.5-149-ge863f125e178)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-5.3.y/kernel/v5.3.5... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-5.3.y/kernel/v5.3.5-149-ge...
Tree: stable-rc Branch: linux-5.3.y Git Describe: v5.3.5-149-ge863f125e178 Git Commit: e863f125e178f8d2edf7a3a03e7fc144d3af82c2 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 80 unique boards, 24 SoC families, 18 builds out of 208
Boot Failures Detected:
arm: imx_v6_v7_defconfig: gcc-8: imx53-qsrb: 1 failed lab
multi_v7_defconfig: gcc-8: imx53-qsrb: 1 failed lab
i386: i386_defconfig: gcc-8: qemu_i386: 1 failed lab
arm64: defconfig: gcc-8: meson-gxl-s805x-libretech-ac: 1 failed lab
Offline Platforms:
arm:
qcom_defconfig: gcc-8 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab
davinci_all_defconfig: gcc-8 dm365evm,legacy: 1 offline lab
sunxi_defconfig: gcc-8 sun5i-r8-chip: 1 offline lab sun7i-a20-bananapi: 1 offline lab
multi_v7_defconfig: gcc-8 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab sun5i-r8-chip: 1 offline lab sun7i-a20-bananapi: 1 offline lab
arm64:
defconfig: gcc-8 apq8016-sbc: 1 offline lab
--- For more info write to info@kernelci.org
On Thu, 10 Oct 2019 at 14:09, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.3.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.3.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 5.3.6-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.3.y git commit: e863f125e178f8d2edf7a3a03e7fc144d3af82c2 git describe: v5.3.5-149-ge863f125e178 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.3-oe/build/v5.3.5-149-g...
No regressions (compared to build v5.3.5)
No fixes (compared to build v5.3.5)
Ran 23503 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libgpiod * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * spectre-meltdown-checker-test * ltp-syscalls-tests * ltp-timers-tests * network-basic-tests * perf * v4l2-compliance * ltp-open-posix-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite
On Thu, Oct 10, 2019 at 10:39:23PM +0530, Naresh Kamboju wrote:
On Thu, 10 Oct 2019 at 14:09, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.3.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.3.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Thanks for testing all 3 of these and letting me know.
greg k-h
On Thu, Oct 10, 2019 at 10:34:21AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.3.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.3.y and the diffstat can be found below.
thanks,
greg k-h
Compiled, booted, and no regressions found on my x86_64 system.
Thanks, Didik Setiawan
On 10/10/19 1:34 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
Build results: total: 158 pass: 158 fail: 0 Qemu test results: total: 391 pass: 391 fail: 0
Guenter
On Thu, Oct 10, 2019 at 03:19:54PM -0700, Guenter Roeck wrote:
On 10/10/19 1:34 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
Build results: total: 158 pass: 158 fail: 0 Qemu test results: total: 391 pass: 391 fail: 0
Great! Hopefully 4.14 now works for you, and thanks for testing all of these.
greg k-h
On 10/10/19 2:34 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.3.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.3.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Thu, Oct 10, 2019 at 09:02:45PM -0600, shuah wrote:
On 10/10/19 2:34 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.3.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.3.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Thanks for testing these and letting me know.
greg k-h
On 10/10/2019 09:34, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.3.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.3.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.3: 12 builds: 12 pass, 0 fail 22 boots: 22 pass, 0 fail 38 tests: 38 pass, 0 fail
Linux version: 5.3.6-rc1-ge863f125e178 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Fri, Oct 11, 2019 at 09:33:41AM +0100, Jon Hunter wrote:
On 10/10/2019 09:34, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.3.6 release. There are 148 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat 12 Oct 2019 08:29:51 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.3.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.3.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.3: 12 builds: 12 pass, 0 fail 22 boots: 22 pass, 0 fail 38 tests: 38 pass, 0 fail
Linux version: 5.3.6-rc1-ge863f125e178 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Great, thanks for testing all of these and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org