This is the start of the stable review cycle for the 5.1.4 release. There are 128 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:41 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.1.4-rc1
Theodore Ts'o tytso@mit.edu ext4: fix block validity checks for journal inodes using indirect blocks
Colin Ian King colin.king@canonical.com ext4: unsigned int compared against zero
Martin Schwidefsky schwidefsky@de.ibm.com s390/mm: convert to the generic get_user_pages_fast code
Martin Schwidefsky schwidefsky@de.ibm.com s390/mm: make the pxd_offset functions more robust
Dan Williams dan.j.williams@intel.com libnvdimm/namespace: Fix label tracking error
Christophe Leroy christophe.leroy@c-s.fr powerpc/32s: fix flush_hash_pages() on SMP
Roger Pau Monne roger.pau@citrix.com xen/pvh: correctly setup the PV EFI interface for dom0
Roger Pau Monne roger.pau@citrix.com xen/pvh: set xen_domain_type to HVM in xen_pvh_init
Masahiro Yamada yamada.masahiro@socionext.com kbuild: turn auto.conf.cmd into a mandatory include file
Steve French stfrench@microsoft.com smb3: display session id in debug data
Sean Christopherson sean.j.christopherson@intel.com KVM: lapic: Busy wait for timer to expire when using hv_timer
Sean Christopherson sean.j.christopherson@intel.com KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
Peter Xu peterx@redhat.com KVM: Fix the bitmap range to copy during clear dirty
Sean Christopherson sean.j.christopherson@intel.com Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
Chengguang Xu cgxu519@gmail.com jbd2: fix potential double free
Michał Wadowski wadosm@gmail.com ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
Kailang Yang kailang@realtek.com ALSA: hda/realtek - Fixup headphone noise via runtime suspend
Jeremy Soller jeremy@system76.com ALSA: hda/realtek - Corrected fixup for System76 Gazelle (gaze14)
Jan Kara jack@suse.cz ext4: avoid panic during forced reboot due to aborted journal
Sahitya Tummala stummala@codeaurora.org ext4: fix use-after-free in dx_release()
Lukas Czerner lczerner@redhat.com ext4: fix data corruption caused by overlapping unaligned and aligned IO
Sriram Rajagopalan sriramr@arista.com ext4: zero out the unused memory region in the extent tree block
Anup Patel Anup.Patel@wdc.com tty: Don't force RISCV SBI console as preferred console
Jiufei Xue jiufei.xue@linux.alibaba.com fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
Mel Gorman mgorman@techsingularity.net mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock
Fabio Estevam festevam@gmail.com ARM: dts: imx: Fix the AR803X phy-mode
Kamlakant Patel kamlakantp@marvell.com ipmi:ssif: compare block number correctly for multi-part return messages
Corey Minyard cminyard@mvista.com ipmi: Add the i2c-addr property for SSIF interfaces
Coly Li colyli@suse.de bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
Liang Chen liangchen.linux@gmail.com bcache: fix a race between cache register and cacheset unregister
Filipe Manana fdmanana@suse.com Btrfs: fix race between send and deduplication that lead to failures and crashes
Filipe Manana fdmanana@suse.com Btrfs: do not start a transaction at iterate_extent_inodes()
Filipe Manana fdmanana@suse.com Btrfs: do not start a transaction during fiemap
Filipe Manana fdmanana@suse.com Btrfs: send, flush dellaloc in order to avoid data loss
Nikolay Borisov nborisov@suse.com btrfs: Honour FITRIM range constraints during free space trim
Nikolay Borisov nborisov@suse.com btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails
Qu Wenruo wqu@suse.com btrfs: Check the first key and level for cached extent buffer
Debabrata Banerjee dbanerje@akamai.com ext4: fix ext4_show_options for file systems w/o journal
Kirill Tkhai ktkhai@virtuozzo.com ext4: actually request zeroing of inode table after grow
Barret Rhoden brho@google.com ext4: fix use-after-free race with debug_want_extra_isize
Pan Bian bianpan2016@163.com ext4: avoid drop reference to iloc.bh twice
Theodore Ts'o tytso@mit.edu ext4: ignore e_value_offs for xattrs with value-in-ea-inode
Theodore Ts'o tytso@mit.edu ext4: protect journal inode's blocks using block_validity
Jan Kara jack@suse.cz ext4: make sanity check in mballoc more strict
Jiufei Xue jiufei.xue@linux.alibaba.com jbd2: check superblock mapped prior to committing
Sergei Trofimovich slyfox@gentoo.org tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
Yifeng Li tomli@tomli.me tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0
Chris Packham chris.packham@alliedtelesis.co.nz mtd: maps: Allow MTD_PHYSMAP with MTD_RAM
Chris Packham chris.packham@alliedtelesis.co.nz mtd: maps: physmap: Store gpio_values correctly
Alexander Sverdlin alexander.sverdlin@nokia.com mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write
Dmitry Osipenko digetx@gmail.com mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values
Steve Twiss stwiss.opensource@diasemi.com mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
Erik Schmauss erik.schmauss@intel.com ACPICA: Linux: move ACPI_DEBUG_DEFAULT flag out of ifndef
Rajat Jain rajatja@google.com ACPI: PM: Set enable_for_wake for wakeup GPEs during suspend-to-idle
Andrea Arcangeli aarcange@redhat.com userfaultfd: use RCU to free the task struct when fork fails
Shuning Zhang sunny.s.zhang@oracle.com ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
Mike Kravetz mike.kravetz@oracle.com hugetlb: use same fault hash key for shared and private mappings
Kai Shen shenkai8@huawei.com mm/hugetlb.c: don't put_page in lock of hugetlb_lock
Dan Williams dan.j.williams@intel.com mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses
Jiri Kosina jkosina@suse.cz mm/mincore.c: make mincore() more conservative
Ofir Drang ofir.drang@arm.com crypto: ccree - handle tee fips error during power management resume
Ofir Drang ofir.drang@arm.com crypto: ccree - add function to handle cryptocell tee fips error
Ofir Drang ofir.drang@arm.com crypto: ccree - HOST_POWER_DOWN_EN should be the last CC access during suspend
Ofir Drang ofir.drang@arm.com crypto: ccree - pm resume first enable the source clk
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - don't map AEAD key and IV on stack
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - use correct internal state sizes for export
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - don't map MAC key on stack
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - fix mem leak on error path
Gilad Ben-Yossef gilad@benyossef.com crypto: ccree - remove special handling of chained sg
Daniel Borkmann daniel@iogearbox.net bpf: fix out of bounds backwards jmps due to dead code removal
Daniel Borkmann daniel@iogearbox.net bpf, arm64: remove prefetch insn in xadd mapping
Libin Yang libin.yang@intel.com ASoC: codec: hdac_hdmi add device_link to card device
S.j. Wang shengjiu.wang@nxp.com ASoC: fsl_esai: Fix missing break in switch statement
Curtis Malainey cujomalainey@chromium.org ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
Jon Hunter jonathanh@nvidia.com ASoC: max98090: Fix restore of DAPM Muxes
Jeremy Soller jeremy@system76.com ALSA: hdea/realtek - Headset fixup for System76 Gazelle (gaze14)
Kailang Yang kailang@realtek.com ALSA: hda/realtek - EAPD turn on later
Hui Wang hui.wang@canonical.com ALSA: hda/hdmi - Consider eld_valid when reporting jack event
Hui Wang hui.wang@canonical.com ALSA: hda/hdmi - Read the pin sense from register when repolling
Wenwen Wang wang6495@umn.edu ALSA: usb-audio: Fix a memory leak bug
Takashi Iwai tiwai@suse.de ALSA: line6: toneport: Fix broken usage of timer for delayed execution
Adrian Hunter adrian.hunter@intel.com mmc: sdhci-pci: Fix BYT OCP setting
Raul E Rangel rrangel@chromium.org mmc: core: Fix tag set memory leak
Sowjanya Komatineni skomatineni@nvidia.com mmc: tegra: fix ddr signaling for non-ddr modes
Christoph Muellner christoph.muellner@theobroma-systems.com dt-bindings: mmc: Add disable-cqe-dcmd property.
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com drivers/dax: Allow to include DEV_DAX_PMEM as builtin
Eric Biggers ebiggers@google.com crypto: arm64/aes-neonbs - don't access already-freed walk.iv
Eric Biggers ebiggers@google.com crypto: arm/aes-neonbs - don't access already-freed walk.iv
Horia Geantă horia.geanta@nxp.com crypto: caam/qi2 - generate hash keys in-place
Horia Geantă horia.geanta@nxp.com crypto: caam/qi2 - fix DMA mapping of stack memory
Horia Geantă horia.geanta@nxp.com crypto: caam/qi2 - fix zero-length buffer DMA mapping
Zhang Zhijie zhangzj@rock-chips.com crypto: rockchip - update IV buffer to contain the next IV
Eric Biggers ebiggers@google.com crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
Eric Biggers ebiggers@google.com crypto: arm64/gcm-aes-ce - fix no-NEON fallback code
Eric Biggers ebiggers@google.com crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
Eric Biggers ebiggers@google.com crypto: crct10dif-generic - fix use via crypto_shash_digest()
Eric Biggers ebiggers@google.com crypto: skcipher - don't WARN on unprocessed data after slow walk step
Daniel Axtens dja@axtens.net crypto: vmx - fix copy-paste error in CTR mode
Singh, Brijesh brijesh.singh@amd.com crypto: ccp - Do not free psp_master when PLATFORM_INIT fails
Eric Biggers ebiggers@google.com crypto: ccm - fix incompatibility between "ccm" and "ccm_base"
Eric Biggers ebiggers@google.com crypto: chacha20poly1305 - set cra_name correctly
Eric Biggers ebiggers@google.com crypto: chacha-generic - fix use as arm64 no-NEON fallback
Eric Biggers ebiggers@google.com crypto: lrw - don't access already-freed walk.iv
Eric Biggers ebiggers@google.com crypto: salsa20 - don't access already-freed walk.iv
Christian Lamparter chunkeey@gmail.com crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues
Christian Lamparter chunkeey@gmail.com crypto: crypto4xx - fix ctr-aes missing output IV
Yazen Ghannam yazen.ghannam@amd.com x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
Yazen Ghannam yazen.ghannam@amd.com x86/MCE: Add an MCE-record filtering function
Peter Zijlstra peterz@infradead.org sched/x86: Save [ER]FLAGS on context switch
Jean-Philippe Brucker jean-philippe.brucker@arm.com arm64: Save and restore OSDLR_EL1 across suspend/resume
Jean-Philippe Brucker jean-philippe.brucker@arm.com arm64: Clear OSDLR_EL1 on CPU boot
Vincenzo Frascino vincenzo.frascino@arm.com arm64: compat: Reduce address limit
Will Deacon will.deacon@arm.com arm64: arch_timer: Ensure counter register reads occur with seqlock held
Boyang Zhou zhouby_cn@126.com arm64: mmap: Ensure file offset is treated as unsigned
Hans de Goede hdegoede@redhat.com power: supply: axp288_fuel_gauge: Add ACEPC T8 and T11 mini PCs to the blacklist
Gustavo A. R. Silva gustavo@embeddedor.com power: supply: axp288_charger: Fix unchecked return value
Wen Yang wen.yang99@zte.com.cn ARM: exynos: Fix a leaked reference by adding missing of_node_put
Christoph Muellner christoph.muellner@theobroma-systems.com mmc: sdhci-of-arasan: Add DTS property to disable DCMDs.
Sylwester Nawrocki s.nawrocki@samsung.com ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3
Sylwester Nawrocki s.nawrocki@samsung.com ARM: dts: exynos: Fix audio routing on Odroid XU3
Stuart Menefy stuart.menefy@mathembedded.com ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260
Christian Lamparter chunkeey@gmail.com ARM: dts: qcom: ipq4019: enlarge PCIe BAR range
Christoph Muellner christoph.muellner@theobroma-systems.com arm64: dts: rockchip: Disable DCMDs on RK3399's eMMC controller.
Katsuhiro Suzuki katsuhiro@katsuster.net arm64: dts: rockchip: fix IO domain voltage setting of APIO5 on rockpro64
Josh Poimboeuf jpoimboe@redhat.com objtool: Fix function fallthrough detection
Andy Lutomirski luto@kernel.org x86/speculation/mds: Improve CPU buffer clear documentation
Andy Lutomirski luto@kernel.org x86/speculation/mds: Revert CPU buffer clear on double fault exit
Waiman Long longman@redhat.com locking/rwsem: Prevent decrement of reader count before increment
-------------
Diffstat:
Documentation/devicetree/bindings/mmc/mmc.txt | 2 + Documentation/x86/mds.rst | 44 +-- Makefile | 6 +- arch/arm/boot/dts/exynos5260.dtsi | 2 +- arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 5 +- arch/arm/boot/dts/imx6-logicpd-baseboard.dtsi | 2 +- arch/arm/boot/dts/imx6dl-riotboard.dts | 2 +- arch/arm/boot/dts/imx6q-ba16.dtsi | 2 +- arch/arm/boot/dts/imx6q-marsboard.dts | 2 +- arch/arm/boot/dts/imx6q-tbs2910.dts | 2 +- arch/arm/boot/dts/imx6qdl-apf6.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-sabreauto.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-sr-som.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-wandboard.dtsi | 2 +- arch/arm/boot/dts/imx6sx-sabreauto.dts | 2 +- arch/arm/boot/dts/imx6sx-sdb.dtsi | 2 +- arch/arm/boot/dts/imx7d-pico.dtsi | 2 +- arch/arm/boot/dts/qcom-ipq4019.dtsi | 4 +- arch/arm/crypto/aes-neonbs-glue.c | 2 + arch/arm/mach-exynos/firmware.c | 1 + arch/arm/mach-exynos/suspend.c | 2 + arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dts | 2 +- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1 + arch/arm64/crypto/aes-neonbs-glue.c | 2 + arch/arm64/crypto/ghash-ce-glue.c | 10 +- arch/arm64/include/asm/arch_timer.h | 33 ++- arch/arm64/include/asm/processor.h | 8 + arch/arm64/kernel/debug-monitors.c | 1 + arch/arm64/kernel/sys.c | 2 +- arch/arm64/kernel/vdso/gettimeofday.S | 15 +- arch/arm64/mm/proc.S | 34 +-- arch/arm64/net/bpf_jit.h | 6 - arch/arm64/net/bpf_jit_comp.c | 1 - arch/powerpc/mm/hash_low_32.S | 3 +- arch/s390/Kconfig | 1 + arch/s390/include/asm/pgtable.h | 79 ++++-- arch/s390/mm/Makefile | 2 +- arch/s390/mm/gup.c | 300 --------------------- arch/x86/crypto/crct10dif-pclmul_glue.c | 13 +- arch/x86/entry/entry_32.S | 2 + arch/x86/entry/entry_64.S | 2 + arch/x86/include/asm/switch_to.h | 1 + arch/x86/kernel/cpu/mce/amd.c | 52 +++- arch/x86/kernel/cpu/mce/core.c | 8 + arch/x86/kernel/cpu/mce/genpool.c | 3 + arch/x86/kernel/cpu/mce/internal.h | 9 + arch/x86/kernel/process_32.c | 7 + arch/x86/kernel/process_64.c | 8 + arch/x86/kernel/traps.c | 8 - arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/vmx/vmx.c | 25 -- arch/x86/kvm/x86.c | 37 ++- arch/x86/platform/pvh/enlighten.c | 8 +- arch/x86/xen/efi.c | 12 +- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/enlighten_pvh.c | 7 +- arch/x86/xen/xen-ops.h | 4 +- crypto/ccm.c | 44 ++- crypto/chacha20poly1305.c | 4 +- crypto/chacha_generic.c | 2 +- crypto/crct10dif_generic.c | 11 +- crypto/gcm.c | 34 +-- crypto/lrw.c | 4 +- crypto/salsa20_generic.c | 2 +- crypto/skcipher.c | 9 +- drivers/acpi/sleep.c | 4 + drivers/char/ipmi/ipmi_dmi.c | 2 + drivers/char/ipmi/ipmi_plat_data.c | 27 +- drivers/char/ipmi/ipmi_plat_data.h | 3 + drivers/char/ipmi/ipmi_si_hardcode.c | 1 + drivers/char/ipmi/ipmi_si_hotmod.c | 1 + drivers/char/ipmi/ipmi_ssif.c | 6 +- drivers/crypto/amcc/crypto4xx_alg.c | 12 +- drivers/crypto/amcc/crypto4xx_core.c | 31 ++- drivers/crypto/caam/caamalg_qi2.c | 177 ++++++------ drivers/crypto/caam/caamalg_qi2.h | 2 - drivers/crypto/ccp/psp-dev.c | 2 +- drivers/crypto/ccree/cc_aead.c | 11 +- drivers/crypto/ccree/cc_buffer_mgr.c | 113 +++----- drivers/crypto/ccree/cc_driver.h | 1 + drivers/crypto/ccree/cc_fips.c | 23 +- drivers/crypto/ccree/cc_fips.h | 2 + drivers/crypto/ccree/cc_hash.c | 28 +- drivers/crypto/ccree/cc_ivgen.c | 9 +- drivers/crypto/ccree/cc_pm.c | 9 +- drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c | 25 +- drivers/crypto/vmx/aesp8-ppc.pl | 4 +- drivers/dax/Kconfig | 3 +- drivers/dax/device.c | 6 +- drivers/edac/mce_amd.c | 4 +- drivers/md/bcache/journal.c | 11 +- drivers/md/bcache/super.c | 2 +- drivers/mmc/core/queue.c | 1 + drivers/mmc/host/Kconfig | 1 + drivers/mmc/host/sdhci-of-arasan.c | 5 +- drivers/mmc/host/sdhci-pci-core.c | 96 +++++++ drivers/mmc/host/sdhci-tegra.c | 1 + drivers/mtd/maps/Kconfig | 2 +- drivers/mtd/maps/physmap-core.c | 2 + drivers/mtd/spi-nor/intel-spi.c | 8 + drivers/nvdimm/label.c | 29 +- drivers/nvdimm/namespace_devs.c | 15 ++ drivers/nvdimm/nd.h | 4 + drivers/power/supply/axp288_charger.c | 4 + drivers/power/supply/axp288_fuel_gauge.c | 20 ++ drivers/tty/hvc/hvc_riscv_sbi.c | 1 - drivers/tty/vt/keyboard.c | 33 ++- drivers/tty/vt/vt.c | 2 - fs/btrfs/backref.c | 34 ++- fs/btrfs/ctree.c | 10 + fs/btrfs/ctree.h | 6 + fs/btrfs/disk-io.c | 27 +- fs/btrfs/disk-io.h | 3 + fs/btrfs/extent-tree.c | 25 +- fs/btrfs/ioctl.c | 19 +- fs/btrfs/send.c | 62 +++++ fs/cifs/cifs_debug.c | 2 + fs/dax.c | 6 +- fs/ext4/block_validity.c | 54 ++++ fs/ext4/extents.c | 17 +- fs/ext4/file.c | 7 + fs/ext4/inode.c | 4 + fs/ext4/ioctl.c | 2 +- fs/ext4/mballoc.c | 2 +- fs/ext4/namei.c | 5 +- fs/ext4/resize.c | 1 + fs/ext4/super.c | 62 +++-- fs/ext4/xattr.c | 2 +- fs/fs-writeback.c | 11 +- fs/hugetlbfs/inode.c | 7 +- fs/jbd2/journal.c | 53 ++-- fs/jbd2/revoke.c | 32 ++- fs/jbd2/transaction.c | 8 +- fs/ocfs2/export.c | 30 ++- include/acpi/platform/aclinux.h | 10 +- include/linux/huge_mm.h | 6 +- include/linux/hugetlb.h | 4 +- include/linux/jbd2.h | 8 +- include/linux/mfd/da9063/registers.h | 6 +- include/linux/mfd/max77620.h | 4 +- kernel/bpf/core.c | 4 +- kernel/fork.c | 31 ++- kernel/locking/rwsem-xadd.c | 44 ++- mm/compaction.c | 4 +- mm/huge_memory.c | 16 +- mm/hugetlb.c | 25 +- mm/mincore.c | 23 +- mm/userfaultfd.c | 3 +- sound/pci/hda/patch_hdmi.c | 11 +- sound/pci/hda/patch_realtek.c | 68 +++-- sound/soc/codecs/hdac_hdmi.c | 11 + sound/soc/codecs/max98090.c | 12 +- sound/soc/codecs/rt5677-spi.c | 35 ++- sound/soc/fsl/fsl_esai.c | 2 +- sound/usb/line6/toneport.c | 16 +- sound/usb/mixer.c | 2 + tools/objtool/check.c | 3 +- virt/kvm/kvm_main.c | 2 +- 159 files changed, 1419 insertions(+), 1079 deletions(-)
[ Upstream commit a9e9bcb45b1525ba7aea26ed9441e8632aeeda58 ]
During my rwsem testing, it was found that after a down_read(), the reader count may occasionally become 0 or even negative. Consequently, a writer may steal the lock at that time and execute with the reader in parallel thus breaking the mutual exclusion guarantee of the write lock. In other words, both readers and writer can become rwsem owners simultaneously.
The current reader wakeup code does it in one pass to clear waiter->task and put them into wake_q before fully incrementing the reader count. Once waiter->task is cleared, the corresponding reader may see it, finish the critical section and do unlock to decrement the count before the count is incremented. This is not a problem if there is only one reader to wake up as the count has been pre-incremented by 1. It is a problem if there are more than one readers to be woken up and writer can steal the lock.
The wakeup was actually done in 2 passes before the following v4.9 commit:
70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once")
To fix this problem, the wakeup is now done in two passes again. In the first pass, we collect the readers and count them. The reader count is then fully incremented. In the second pass, the waiter->task is then cleared and they are put into wake_q to be woken up later.
Signed-off-by: Waiman Long longman@redhat.com Acked-by: Linus Torvalds torvalds@linux-foundation.org Cc: Borislav Petkov bp@alien8.de Cc: Davidlohr Bueso dave@stgolabs.net Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Will Deacon will.deacon@arm.com Cc: huang ying huang.ying.caritas@gmail.com Fixes: 70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once") Link: http://lkml.kernel.org/r/20190428212557.13482-2-longman@redhat.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/locking/rwsem-xadd.c | 44 +++++++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 14 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index fbe96341beeed..59b801de8dd5c 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -130,6 +130,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, { struct rwsem_waiter *waiter, *tmp; long oldcount, woken = 0, adjustment = 0; + struct list_head wlist;
/* * Take a peek at the queue head waiter such that we can determine @@ -188,18 +189,42 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, * of the queue. We know that woken will be at least 1 as we accounted * for above. Note we increment the 'active part' of the count by the * number of readers before waking any processes up. + * + * We have to do wakeup in 2 passes to prevent the possibility that + * the reader count may be decremented before it is incremented. It + * is because the to-be-woken waiter may not have slept yet. So it + * may see waiter->task got cleared, finish its critical section and + * do an unlock before the reader count increment. + * + * 1) Collect the read-waiters in a separate list, count them and + * fully increment the reader count in rwsem. + * 2) For each waiters in the new list, clear waiter->task and + * put them into wake_q to be woken up later. */ - list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) { - struct task_struct *tsk; - + list_for_each_entry(waiter, &sem->wait_list, list) { if (waiter->type == RWSEM_WAITING_FOR_WRITE) break;
woken++; - tsk = waiter->task; + } + list_cut_before(&wlist, &sem->wait_list, &waiter->list); + + adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment; + if (list_empty(&sem->wait_list)) { + /* hit end of list above */ + adjustment -= RWSEM_WAITING_BIAS; + } + + if (adjustment) + atomic_long_add(adjustment, &sem->count); + + /* 2nd pass */ + list_for_each_entry_safe(waiter, tmp, &wlist, list) { + struct task_struct *tsk;
+ tsk = waiter->task; get_task_struct(tsk); - list_del(&waiter->list); + /* * Ensure calling get_task_struct() before setting the reader * waiter to nil such that rwsem_down_read_failed() cannot @@ -213,15 +238,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, */ wake_q_add_safe(wake_q, tsk); } - - adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment; - if (list_empty(&sem->wait_list)) { - /* hit end of list above */ - adjustment -= RWSEM_WAITING_BIAS; - } - - if (adjustment) - atomic_long_add(adjustment, &sem->count); }
/*
From: Andy Lutomirski luto@kernel.org
commit 88640e1dcd089879530a49a8d212d1814678dfe7 upstream.
The double fault ESPFIX path doesn't return to user mode at all -- it returns back to the kernel by simulating a #GP fault. prepare_exit_to_usermode() will run on the way out of general_protection before running user code.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@suse.de Cc: Frederic Weisbecker frederic@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jon Masters jcm@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user") Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/x86/mds.rst | 7 ------- arch/x86/kernel/traps.c | 8 -------- 2 files changed, 15 deletions(-)
--- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -158,13 +158,6 @@ Mitigation points mitigated on the return from do_nmi() to provide almost complete coverage.
- - Double fault (#DF): - - A double fault is usually fatal, but the ESPFIX workaround, which can - be triggered from user space through modify_ldt(2) is a recoverable - double fault. #DF uses the paranoid exit path, so explicit mitigation - in the double fault handler is required. - - Machine Check Exception (#MC):
Another corner case is a #MC which hits between the CPU buffer clear --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -58,7 +58,6 @@ #include <asm/alternative.h> #include <asm/fpu/xstate.h> #include <asm/trace/mpx.h> -#include <asm/nospec-branch.h> #include <asm/mpx.h> #include <asm/vm86.h> #include <asm/umip.h> @@ -368,13 +367,6 @@ dotraplinkage void do_double_fault(struc regs->ip = (unsigned long)general_protection; regs->sp = (unsigned long)&gpregs->orig_ax;
- /* - * This situation can be triggered by userspace via - * modify_ldt(2) and the return does not take the regular - * user space exit, so a CPU buffer clear is required when - * MDS mitigation is enabled. - */ - mds_user_clear_cpu_buffers(); return; } #endif
From: Andy Lutomirski luto@kernel.org
commit 9d8d0294e78a164d407133dea05caf4b84247d6a upstream.
On x86_64, all returns to usermode go through prepare_exit_to_usermode(), with the sole exception of do_nmi(). This even includes machine checks -- this was added several years ago to support MCE recovery. Update the documentation.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@suse.de Cc: Frederic Weisbecker frederic@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jon Masters jcm@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user") Link: http://lkml.kernel.org/r/999fa9e126ba6a48e9d214d2f18dbde5c62ac55c.1557865329... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/x86/mds.rst | 39 +++++++-------------------------------- 1 file changed, 7 insertions(+), 32 deletions(-)
--- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -142,38 +142,13 @@ Mitigation points mds_user_clear.
The mitigation is invoked in prepare_exit_to_usermode() which covers - most of the kernel to user space transitions. There are a few exceptions - which are not invoking prepare_exit_to_usermode() on return to user - space. These exceptions use the paranoid exit code. - - - Non Maskable Interrupt (NMI): - - Access to sensible data like keys, credentials in the NMI context is - mostly theoretical: The CPU can do prefetching or execute a - misspeculated code path and thereby fetching data which might end up - leaking through a buffer. - - But for mounting other attacks the kernel stack address of the task is - already valuable information. So in full mitigation mode, the NMI is - mitigated on the return from do_nmi() to provide almost complete - coverage. - - - Machine Check Exception (#MC): - - Another corner case is a #MC which hits between the CPU buffer clear - invocation and the actual return to user. As this still is in kernel - space it takes the paranoid exit path which does not clear the CPU - buffers. So the #MC handler repopulates the buffers to some - extent. Machine checks are not reliably controllable and the window is - extremly small so mitigation would just tick a checkbox that this - theoretical corner case is covered. To keep the amount of special - cases small, ignore #MC. - - - Debug Exception (#DB): - - This takes the paranoid exit path only when the INT1 breakpoint is in - kernel space. #DB on a user space address takes the regular exit path, - so no extra mitigation required. + all but one of the kernel to user space transitions. The exception + is when we return from a Non Maskable Interrupt (NMI), which is + handled directly in do_nmi(). + + (The reason that NMI is special is that prepare_exit_to_usermode() can + enable IRQs. In NMI context, NMIs are blocked, and we don't want to + enable IRQs with NMIs blocked.)
2. C-State transition
From: Josh Poimboeuf jpoimboe@redhat.com
commit e6f393bc939d566ce3def71232d8013de9aaadde upstream.
When a function falls through to the next function due to a compiler bug, objtool prints some obscure warnings. For example:
drivers/regulator/core.o: warning: objtool: regulator_count_voltages()+0x95: return with modified stack frame drivers/regulator/core.o: warning: objtool: regulator_count_voltages()+0x0: stack state mismatch: cfa1=7+32 cfa2=7+8
Instead it should be printing:
drivers/regulator/core.o: warning: objtool: regulator_supply_is_couple() falls through to next function regulator_count_voltages()
This used to work, but was broken by the following commit:
13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
The padding nops at the end of a function aren't actually part of the function, as defined by the symbol table. So the 'func' variable in validate_branch() is getting cleared to NULL when a padding nop is encountered, breaking the fallthrough detection.
If the current instruction doesn't have a function associated with it, just consider it to be part of the previously detected function by not overwriting the previous value of 'func'.
Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions") Link: http://lkml.kernel.org/r/546d143820cd08a46624ae8440d093dd6c902cae.1557766718... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/objtool/check.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1832,7 +1832,8 @@ static int validate_branch(struct objtoo return 1; }
- func = insn->func ? insn->func->pfunc : NULL; + if (insn->func) + func = insn->func->pfunc;
if (func && insn->ignore) { WARN_FUNC("BUG: why am I validating an ignored function?",
From: Katsuhiro Suzuki katsuhiro@katsuster.net
commit 798689e45190756c2eca6656ee4c624370a5012a upstream.
This patch fixes IO domain voltage setting that is related to audio_gpio3d4a_ms (bit 1) of GRF_IO_VSEL.
This is because RockPro64 schematics P.16 says that regulator supplies 3.0V power to APIO5_VDD. So audio_gpio3d4a_ms bit should be clear (means 3.0V). Power domain map is saying different thing (supplies 1.8V) but I believe P.16 is actual connectings.
Fixes: e4f3fb490967 ("arm64: dts: rockchip: add initial dts support for Rockpro64") Cc: stable@vger.kernel.org Suggested-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Katsuhiro Suzuki katsuhiro@katsuster.net Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dts @@ -504,7 +504,7 @@ status = "okay";
bt656-supply = <&vcc1v8_dvp>; - audio-supply = <&vcca1v8_codec>; + audio-supply = <&vcc_3v0>; sdmmc-supply = <&vcc_sdio>; gpio1830-supply = <&vcc_3v0>; };
From: Christoph Muellner christoph.muellner@theobroma-systems.com
commit a3eec13b8fd2b9791a21fa16e38dfea8111579bf upstream.
When using direct commands (DCMDs) on an RK3399, we get spurious CQE completion interrupts for the DCMD transaction slot (#31):
[ 931.196520] ------------[ cut here ]------------ [ 931.201702] mmc1: cqhci: spurious TCN for tag 31 [ 931.206906] WARNING: CPU: 0 PID: 1433 at /usr/src/kernel/drivers/mmc/host/cqhci.c:725 cqhci_irq+0x2e4/0x490 [ 931.206909] Modules linked in: [ 931.206918] CPU: 0 PID: 1433 Comm: irq/29-mmc1 Not tainted 4.19.8-rt6-funkadelic #1 [ 931.206920] Hardware name: Theobroma Systems RK3399-Q7 SoM (DT) [ 931.206924] pstate: 40000005 (nZcv daif -PAN -UAO) [ 931.206927] pc : cqhci_irq+0x2e4/0x490 [ 931.206931] lr : cqhci_irq+0x2e4/0x490 [ 931.206933] sp : ffff00000e54bc80 [ 931.206934] x29: ffff00000e54bc80 x28: 0000000000000000 [ 931.206939] x27: 0000000000000001 x26: ffff000008f217e8 [ 931.206944] x25: ffff8000f02ef030 x24: ffff0000091417b0 [ 931.206948] x23: ffff0000090aa000 x22: ffff8000f008b000 [ 931.206953] x21: 0000000000000002 x20: 000000000000001f [ 931.206957] x19: ffff8000f02ef018 x18: ffffffffffffffff [ 931.206961] x17: 0000000000000000 x16: 0000000000000000 [ 931.206966] x15: ffff0000090aa6c8 x14: 0720072007200720 [ 931.206970] x13: 0720072007200720 x12: 0720072007200720 [ 931.206975] x11: 0720072007200720 x10: 0720072007200720 [ 931.206980] x9 : 0720072007200720 x8 : 0720072007200720 [ 931.206984] x7 : 0720073107330720 x6 : 00000000000005a0 [ 931.206988] x5 : ffff00000860d4b0 x4 : 0000000000000000 [ 931.206993] x3 : 0000000000000001 x2 : 0000000000000001 [ 931.206997] x1 : 1bde3a91b0d4d900 x0 : 0000000000000000 [ 931.207001] Call trace: [ 931.207005] cqhci_irq+0x2e4/0x490 [ 931.207009] sdhci_arasan_cqhci_irq+0x5c/0x90 [ 931.207013] sdhci_irq+0x98/0x930 [ 931.207019] irq_forced_thread_fn+0x2c/0xa0 [ 931.207023] irq_thread+0x114/0x1c0 [ 931.207027] kthread+0x128/0x130 [ 931.207032] ret_from_fork+0x10/0x20 [ 931.207035] ---[ end trace 0000000000000002 ]---
The driver shows this message only for the first spurious interrupt by using WARN_ONCE(). Changing this to WARN() shows, that this is happening quite frequently (up to once a second).
Since the eMMC 5.1 specification, where CQE and CQHCI are specified, does not mention that spurious TCN interrupts for DCMDs can be simply ignored, we must assume that using this feature is not working reliably.
The current implementation uses DCMD for REQ_OP_FLUSH only, and I could not see any performance/power impact when disabling this optional feature for RK3399.
Therefore this patch disables DCMDs for RK3399.
Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com Signed-off-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com Fixes: 84362d79f436 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1") Cc: stable@vger.kernel.org [the corresponding code changes are queued for 5.2 so doing that as well] Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -333,6 +333,7 @@ phys = <&emmc_phy>; phy-names = "phy_arasan"; power-domains = <&power RK3399_PD_EMMC>; + disable-cqe-dcmd; status = "disabled"; };
From: Christian Lamparter chunkeey@gmail.com
commit f3e35357cd460a8aeb48b8113dc4b761a7d5c828 upstream.
David Bauer reported that the VDSL modem (attached via PCIe) on his AVM Fritz!Box 7530 was complaining about not having enough space in the BAR. A closer inspection of the old qcom-ipq40xx.dtsi pulled from the GL-iNet repository listed:
| qcom,pcie@80000 { | compatible = "qcom,msm_pcie"; | reg = <0x80000 0x2000>, | <0x99000 0x800>, | <0x40000000 0xf1d>, | <0x40000f20 0xa8>, | <0x40100000 0x1000>, | <0x40200000 0x100000>, | <0x40300000 0xd00000>; | reg-names = "parf", "phy", "dm_core", "elbi", | "conf", "io", "bars";
Matching the reg-names with the listed reg leads to <0xd00000> as the size for the "bars".
Cc: stable@vger.kernel.org BugLink: https://www.mail-archive.com/openwrt-devel@lists.openwrt.org/msg45212.html Reported-by: David Bauer mail@david-bauer.net Signed-off-by: Christian Lamparter chunkeey@gmail.com Signed-off-by: Andy Gross agross@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/qcom-ipq4019.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/boot/dts/qcom-ipq4019.dtsi +++ b/arch/arm/boot/dts/qcom-ipq4019.dtsi @@ -400,8 +400,8 @@ #address-cells = <3>; #size-cells = <2>;
- ranges = <0x81000000 0 0x40200000 0x40200000 0 0x00100000 - 0x82000000 0 0x40300000 0x40300000 0 0x400000>; + ranges = <0x81000000 0 0x40200000 0x40200000 0 0x00100000>, + <0x82000000 0 0x40300000 0x40300000 0 0x00d00000>;
interrupts = <GIC_SPI 141 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "msi";
From: Stuart Menefy stuart.menefy@mathembedded.com
commit b7ed69d67ff0788d8463e599dd5dd1b45c701a7e upstream.
Fix the interrupt information for the GPIO lines with a shared EINT interrupt.
Fixes: 16d7ff2642e7 ("ARM: dts: add dts files for exynos5260 SoC") Cc: stable@vger.kernel.org Signed-off-by: Stuart Menefy stuart.menefy@mathembedded.com Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/exynos5260.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/exynos5260.dtsi +++ b/arch/arm/boot/dts/exynos5260.dtsi @@ -223,7 +223,7 @@ wakeup-interrupt-controller { compatible = "samsung,exynos4210-wakeup-eint"; interrupt-parent = <&gic>; - interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>; }; };
From: Sylwester Nawrocki s.nawrocki@samsung.com
commit 34dc82257488ccbdfb6ecdd087b3c8b371e03ee3 upstream.
Add missing audio routing entry for the capture stream, this change is required to fix audio recording on Odroid XU3/XU3-Lite.
Fixes: 885b005d232c ("ARM: dts: exynos: Add support for secondary DAI to Odroid XU3") Cc: stable@vger.kernel.org Signed-off-by: Sylwester Nawrocki s.nawrocki@samsung.com Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi +++ b/arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi @@ -26,7 +26,8 @@ "Speakers", "SPKL", "Speakers", "SPKR", "I2S Playback", "Mixer DAI TX", - "HiFi Playback", "Mixer DAI TX"; + "HiFi Playback", "Mixer DAI TX", + "Mixer DAI RX", "HiFi Capture";
assigned-clocks = <&clock CLK_MOUT_EPLL>, <&clock CLK_MOUT_MAU_EPLL>,
From: Sylwester Nawrocki s.nawrocki@samsung.com
commit 9b23e1a3e8fde76e8cc0e366ab1ed4ffb4440feb upstream.
The name of CODEC input widget to which microphone is connected through the "Headphone" jack is "IN12" not "IN1". This fixes microphone support on Odroid XU3.
Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Sylwester Nawrocki s.nawrocki@samsung.com Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi +++ b/arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi @@ -22,7 +22,7 @@ "Headphone Jack", "HPL", "Headphone Jack", "HPR", "Headphone Jack", "MICBIAS", - "IN1", "Headphone Jack", + "IN12", "Headphone Jack", "Speakers", "SPKL", "Speakers", "SPKR", "I2S Playback", "Mixer DAI TX",
From: Christoph Muellner christoph.muellner@theobroma-systems.com
commit 7bda9482e7ed4d27d83c1f9cb5cbe3b34ddac3e8 upstream.
Direct commands (DCMDs) are an optional feature of eMMC 5.1's command queue engine (CQE). The Arasan eMMC 5.1 controller uses the CQHCI, which exposes a control register bit to enable the feature. The current implementation sets this bit unconditionally.
This patch allows to suppress the feature activation, by specifying the property disable-cqe-dcmd.
Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com Signed-off-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com Acked-by: Adrian Hunter adrian.hunter@intel.com Fixes: 84362d79f436 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-of-arasan.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/mmc/host/sdhci-of-arasan.c +++ b/drivers/mmc/host/sdhci-of-arasan.c @@ -832,7 +832,10 @@ static int sdhci_arasan_probe(struct pla host->mmc_host_ops.start_signal_voltage_switch = sdhci_arasan_voltage_switch; sdhci_arasan->has_cqe = true; - host->mmc->caps2 |= MMC_CAP2_CQE | MMC_CAP2_CQE_DCMD; + host->mmc->caps2 |= MMC_CAP2_CQE; + + if (!of_property_read_bool(np, "disable-cqe-dcmd")) + host->mmc->caps2 |= MMC_CAP2_CQE_DCMD; }
ret = sdhci_arasan_add_host(sdhci_arasan);
From: Wen Yang wen.yang99@zte.com.cn
commit 629266bf7229cd6a550075f5961f95607b823b59 upstream.
The call to of_get_next_child returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage.
Detected by coccinelle with warnings like: arch/arm/mach-exynos/firmware.c:201:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 193, but without a corresponding object release within this function.
Cc: stable@vger.kernel.org Signed-off-by: Wen Yang wen.yang99@zte.com.cn Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/mach-exynos/firmware.c | 1 + arch/arm/mach-exynos/suspend.c | 2 ++ 2 files changed, 3 insertions(+)
--- a/arch/arm/mach-exynos/firmware.c +++ b/arch/arm/mach-exynos/firmware.c @@ -196,6 +196,7 @@ bool __init exynos_secure_firmware_avail return false;
addr = of_get_address(nd, 0, NULL, NULL); + of_node_put(nd); if (!addr) { pr_err("%s: No address specified.\n", __func__); return false; --- a/arch/arm/mach-exynos/suspend.c +++ b/arch/arm/mach-exynos/suspend.c @@ -639,8 +639,10 @@ void __init exynos_pm_init(void)
if (WARN_ON(!of_find_property(np, "interrupt-controller", NULL))) { pr_warn("Outdated DT detected, suspend/resume will NOT work\n"); + of_node_put(np); return; } + of_node_put(np);
pm_data = (const struct exynos_pm_data *) match->data;
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit c3422ad5f84a66739ec6a37251ca27638c85b6be upstream.
Currently there is no check on platform_get_irq() return value in case it fails, hence never actually reporting any errors and causing unexpected behavior when using such value as argument for function regmap_irq_get_virq().
Fix this by adding a proper check, a message reporting any errors and returning *pirq*
Addresses-Coverity-ID: 1443940 ("Improper use of negative value") Fixes: 843735b788a4 ("power: axp288_charger: axp288 charger driver") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/power/supply/axp288_charger.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/power/supply/axp288_charger.c +++ b/drivers/power/supply/axp288_charger.c @@ -833,6 +833,10 @@ static int axp288_charger_probe(struct p /* Register charger interrupts */ for (i = 0; i < CHRG_INTR_END; i++) { pirq = platform_get_irq(info->pdev, i); + if (pirq < 0) { + dev_err(&pdev->dev, "Failed to get IRQ: %d\n", pirq); + return pirq; + } info->irq[i] = regmap_irq_get_virq(info->regmap_irqc, pirq); if (info->irq[i] < 0) { dev_warn(&info->pdev->dev,
From: Hans de Goede hdegoede@redhat.com
commit 9274c78305e12c5f461bec15f49c38e0f32ca705 upstream.
The ACEPC T8 and T11 Cherry Trail Z8350 mini PCs use an AXP288 and as PCs, rather then portables, they does not have a battery. Still for some reason the AXP288 not only thinks there is a battery, it actually thinks it is discharging while the PC is running, slowly going to 0% full, causing userspace to shutdown the system due to the battery being critically low after a while.
This commit adds the ACEPC T8 and T11 to the axp288 fuel-gauge driver blacklist, so that we stop reporting bogus battery readings on this device.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1690852 Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/power/supply/axp288_fuel_gauge.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/drivers/power/supply/axp288_fuel_gauge.c +++ b/drivers/power/supply/axp288_fuel_gauge.c @@ -686,6 +686,26 @@ intr_failed: */ static const struct dmi_system_id axp288_fuel_gauge_blacklist[] = { { + /* ACEPC T8 Cherry Trail Z8350 mini PC */ + .matches = { + DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "To be filled by O.E.M."), + DMI_EXACT_MATCH(DMI_BOARD_NAME, "Cherry Trail CR"), + DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "T8"), + /* also match on somewhat unique bios-version */ + DMI_EXACT_MATCH(DMI_BIOS_VERSION, "1.000"), + }, + }, + { + /* ACEPC T11 Cherry Trail Z8350 mini PC */ + .matches = { + DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "To be filled by O.E.M."), + DMI_EXACT_MATCH(DMI_BOARD_NAME, "Cherry Trail CR"), + DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "T11"), + /* also match on somewhat unique bios-version */ + DMI_EXACT_MATCH(DMI_BIOS_VERSION, "1.000"), + }, + }, + { /* Intel Cherry Trail Compute Stick, Windows version */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Intel Corporation"),
From: Boyang Zhou zhouby_cn@126.com
commit f08cae2f28db24d95be5204046b60618d8de4ddc upstream.
The file offset argument to the arm64 sys_mmap() implementation is scaled from bytes to pages by shifting right by PAGE_SHIFT. Unfortunately, the offset is passed in as a signed 'off_t' type and therefore large offsets (i.e. with the top bit set) are incorrectly sign-extended by the shift. This has been observed to cause false mmap() failures when mapping GPU doorbells on an arm64 server part.
Change the type of the file offset argument to sys_mmap() from 'off_t' to 'unsigned long' so that the shifting scales the value as expected.
Cc: stable@vger.kernel.org Signed-off-by: Boyang Zhou zhouby_cn@126.com [will: rewrote commit message] Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/sys.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm64/kernel/sys.c +++ b/arch/arm64/kernel/sys.c @@ -31,7 +31,7 @@
SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len, unsigned long, prot, unsigned long, flags, - unsigned long, fd, off_t, off) + unsigned long, fd, unsigned long, off) { if (offset_in_page(off) != 0) return -EINVAL;
From: Will Deacon will.deacon@arm.com
commit 75a19a0202db21638a1c2b424afb867e1f9a2376 upstream.
When executing clock_gettime(), either in the vDSO or via a system call, we need to ensure that the read of the counter register occurs within the seqlock reader critical section. This ensures that updates to the clocksource parameters (e.g. the multiplier) are consistent with the counter value and therefore avoids the situation where time appears to go backwards across multiple reads.
Extend the vDSO logic so that the seqlock critical section covers the read of the counter register as well as accesses to the data page. Since reads of the counter system registers are not ordered by memory barrier instructions, introduce dependency ordering from the counter read to a subsequent memory access so that the seqlock memory barriers apply to the counter access in both the vDSO and the system call paths.
Cc: stable@vger.kernel.org Cc: Marc Zyngier marc.zyngier@arm.com Tested-by: Vincenzo Frascino vincenzo.frascino@arm.com Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@... Reported-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/arch_timer.h | 33 +++++++++++++++++++++++++++++++-- arch/arm64/kernel/vdso/gettimeofday.S | 15 +++++++++++---- 2 files changed, 42 insertions(+), 6 deletions(-)
--- a/arch/arm64/include/asm/arch_timer.h +++ b/arch/arm64/include/asm/arch_timer.h @@ -148,18 +148,47 @@ static inline void arch_timer_set_cntkct isb(); }
+/* + * Ensure that reads of the counter are treated the same as memory reads + * for the purposes of ordering by subsequent memory barriers. + * + * This insanity brought to you by speculative system register reads, + * out-of-order memory accesses, sequence locks and Thomas Gleixner. + * + * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.h... + */ +#define arch_counter_enforce_ordering(val) do { \ + u64 tmp, _val = (val); \ + \ + asm volatile( \ + " eor %0, %1, %1\n" \ + " add %0, sp, %0\n" \ + " ldr xzr, [%0]" \ + : "=r" (tmp) : "r" (_val)); \ +} while (0) + static inline u64 arch_counter_get_cntpct(void) { + u64 cnt; + isb(); - return arch_timer_reg_read_stable(cntpct_el0); + cnt = arch_timer_reg_read_stable(cntpct_el0); + arch_counter_enforce_ordering(cnt); + return cnt; }
static inline u64 arch_counter_get_cntvct(void) { + u64 cnt; + isb(); - return arch_timer_reg_read_stable(cntvct_el0); + cnt = arch_timer_reg_read_stable(cntvct_el0); + arch_counter_enforce_ordering(cnt); + return cnt; }
+#undef arch_counter_enforce_ordering + static inline int arch_timer_arch_init(void) { return 0; --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ b/arch/arm64/kernel/vdso/gettimeofday.S @@ -73,6 +73,13 @@ x_tmp .req x8 movn x_tmp, #0xff00, lsl #48 and \res, x_tmp, \res mul \res, \res, \mult + /* + * Fake address dependency from the value computed from the counter + * register to subsequent data page accesses so that the sequence + * locking also orders the read of the counter. + */ + and x_tmp, \res, xzr + add vdso_data, vdso_data, x_tmp .endm
/* @@ -147,12 +154,12 @@ ENTRY(__kernel_gettimeofday) /* w11 = cs_mono_mult, w12 = cs_shift */ ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] - seqcnt_check fail=1b
get_nsec_per_sec res=x9 lsl x9, x9, x12
get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 + seqcnt_check fail=1b get_ts_realtime res_sec=x10, res_nsec=x11, \ clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9
@@ -211,13 +218,13 @@ realtime: /* w11 = cs_mono_mult, w12 = cs_shift */ ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] - seqcnt_check fail=realtime
/* All computations are done with left-shifted nsecs. */ get_nsec_per_sec res=x9 lsl x9, x9, x12
get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 + seqcnt_check fail=realtime get_ts_realtime res_sec=x10, res_nsec=x11, \ clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9 clock_gettime_return, shift=1 @@ -231,7 +238,6 @@ monotonic: ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] ldp x3, x4, [vdso_data, #VDSO_WTM_CLK_SEC] - seqcnt_check fail=monotonic
/* All computations are done with left-shifted nsecs. */ lsl x4, x4, x12 @@ -239,6 +245,7 @@ monotonic: lsl x9, x9, x12
get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 + seqcnt_check fail=monotonic get_ts_realtime res_sec=x10, res_nsec=x11, \ clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9
@@ -253,13 +260,13 @@ monotonic_raw: /* w11 = cs_raw_mult, w12 = cs_shift */ ldp w12, w11, [vdso_data, #VDSO_CS_SHIFT] ldp x13, x14, [vdso_data, #VDSO_RAW_TIME_SEC] - seqcnt_check fail=monotonic_raw
/* All computations are done with left-shifted nsecs. */ get_nsec_per_sec res=x9 lsl x9, x9, x12
get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 + seqcnt_check fail=monotonic_raw get_ts_clock_raw res_sec=x10, res_nsec=x11, \ clock_nsec=x15, nsec_to_sec=x9
From: Vincenzo Frascino vincenzo.frascino@arm.com
commit d263119387de9975d2acba1dfd3392f7c5979c18 upstream.
Currently, compat tasks running on arm64 can allocate memory up to TASK_SIZE_32 (UL(0x100000000)).
This means that mmap() allocations, if we treat them as returning an array, are not compliant with the sections 6.5.8 of the C standard (C99) which states that: "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P".
Redefine TASK_SIZE_32 to address the issue.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Cc: Jann Horn jannh@google.com Cc: stable@vger.kernel.org Reported-by: Jann Horn jannh@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com [will: fixed typo in comment] Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/processor.h | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -57,7 +57,15 @@ #define TASK_SIZE_64 (UL(1) << vabits_user)
#ifdef CONFIG_COMPAT +#ifdef CONFIG_ARM64_64K_PAGES +/* + * With CONFIG_ARM64_64K_PAGES enabled, the last page is occupied + * by the compat vectors page. + */ #define TASK_SIZE_32 UL(0x100000000) +#else +#define TASK_SIZE_32 (UL(0x100000000) - PAGE_SIZE) +#endif /* CONFIG_ARM64_64K_PAGES */ #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ TASK_SIZE_32 : TASK_SIZE_64) #define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
From: Jean-Philippe Brucker jean-philippe.brucker@arm.com
commit 6fda41bf12615ee7c3ddac88155099b1a8cf8d00 upstream.
Some firmwares may reboot CPUs with OS Double Lock set. Make sure that it is unlocked, in order to use debug exceptions.
Cc: stable@vger.kernel.org Signed-off-by: Jean-Philippe Brucker jean-philippe.brucker@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/debug-monitors.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -135,6 +135,7 @@ NOKPROBE_SYMBOL(disable_debug_monitors); */ static int clear_os_lock(unsigned int cpu) { + write_sysreg(0, osdlr_el1); write_sysreg(0, oslar_el1); isb(); return 0;
From: Jean-Philippe Brucker jean-philippe.brucker@arm.com
commit 827a108e354db633698f0b4a10c1ffd2b1f8d1d0 upstream.
When the CPU comes out of suspend, the firmware may have modified the OS Double Lock Register. Save it in an unused slot of cpu_suspend_ctx, and restore it on resume.
Cc: stable@vger.kernel.org Signed-off-by: Jean-Philippe Brucker jean-philippe.brucker@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/mm/proc.S | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-)
--- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -65,24 +65,25 @@ ENTRY(cpu_do_suspend) mrs x2, tpidr_el0 mrs x3, tpidrro_el0 mrs x4, contextidr_el1 - mrs x5, cpacr_el1 - mrs x6, tcr_el1 - mrs x7, vbar_el1 - mrs x8, mdscr_el1 - mrs x9, oslsr_el1 - mrs x10, sctlr_el1 + mrs x5, osdlr_el1 + mrs x6, cpacr_el1 + mrs x7, tcr_el1 + mrs x8, vbar_el1 + mrs x9, mdscr_el1 + mrs x10, oslsr_el1 + mrs x11, sctlr_el1 alternative_if_not ARM64_HAS_VIRT_HOST_EXTN - mrs x11, tpidr_el1 + mrs x12, tpidr_el1 alternative_else - mrs x11, tpidr_el2 + mrs x12, tpidr_el2 alternative_endif - mrs x12, sp_el0 + mrs x13, sp_el0 stp x2, x3, [x0] - stp x4, xzr, [x0, #16] - stp x5, x6, [x0, #32] - stp x7, x8, [x0, #48] - stp x9, x10, [x0, #64] - stp x11, x12, [x0, #80] + stp x4, x5, [x0, #16] + stp x6, x7, [x0, #32] + stp x8, x9, [x0, #48] + stp x10, x11, [x0, #64] + stp x12, x13, [x0, #80] ret ENDPROC(cpu_do_suspend)
@@ -105,8 +106,8 @@ ENTRY(cpu_do_resume) msr cpacr_el1, x6
/* Don't change t0sz here, mask those bits when restoring */ - mrs x5, tcr_el1 - bfi x8, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH + mrs x7, tcr_el1 + bfi x8, x7, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
msr tcr_el1, x8 msr vbar_el1, x9 @@ -130,6 +131,7 @@ alternative_endif /* * Restore oslsr_el1 by writing oslar_el1 */ + msr osdlr_el1, x5 ubfx x11, x11, #1, #1 msr oslar_el1, x11 reset_pmuserenr_el0 x0 // Disable PMU access from EL0
From: Peter Zijlstra peterz@infradead.org
commit 6690e86be83ac75832e461c141055b5d601c0a6d upstream.
Effectively reverts commit:
2c7577a75837 ("sched/x86_64: Don't save flags on context switch")
Specifically because SMAP uses FLAGS.AC which invalidates the claim that the kernel has clean flags.
In particular; while preemption from interrupt return is fine (the IRET frame on the exception stack contains FLAGS) it breaks any code that does synchonous scheduling, including preempt_enable().
This has become a significant issue ever since commit:
5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
provided for means of having 'normal' C code between STAC / CLAC, exposing the FLAGS.AC state. So far this hasn't led to trouble, however fix it before it comes apart.
Reported-by: Julien Thierry julien.thierry@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@alien8.de Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@kernel.org Fixes: 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses") Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_32.S | 2 ++ arch/x86/entry/entry_64.S | 2 ++ arch/x86/include/asm/switch_to.h | 1 + arch/x86/kernel/process_32.c | 7 +++++++ arch/x86/kernel/process_64.c | 8 ++++++++ 5 files changed, 20 insertions(+)
--- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -650,6 +650,7 @@ ENTRY(__switch_to_asm) pushl %ebx pushl %edi pushl %esi + pushfl
/* switch stack */ movl %esp, TASK_threadsp(%eax) @@ -672,6 +673,7 @@ ENTRY(__switch_to_asm) #endif
/* restore callee-saved registers */ + popfl popl %esi popl %edi popl %ebx --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -291,6 +291,7 @@ ENTRY(__switch_to_asm) pushq %r13 pushq %r14 pushq %r15 + pushfq
/* switch stack */ movq %rsp, TASK_threadsp(%rdi) @@ -313,6 +314,7 @@ ENTRY(__switch_to_asm) #endif
/* restore callee-saved registers */ + popfq popq %r15 popq %r14 popq %r13 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -40,6 +40,7 @@ asmlinkage void ret_from_fork(void); * order of the fields must match the code in __switch_to_asm(). */ struct inactive_task_frame { + unsigned long flags; #ifdef CONFIG_X86_64 unsigned long r15; unsigned long r14; --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -127,6 +127,13 @@ int copy_thread_tls(unsigned long clone_ struct task_struct *tsk; int err;
+ /* + * For a new task use the RESET flags value since there is no before. + * All the status flags are zero; DF and all the system flags must also + * be 0, specifically IF must be 0 because we context switch to the new + * task with interrupts disabled. + */ + frame->flags = X86_EFLAGS_FIXED; frame->bp = 0; frame->ret_addr = (unsigned long) ret_from_fork; p->thread.sp = (unsigned long) fork_frame; --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -392,6 +392,14 @@ int copy_thread_tls(unsigned long clone_ childregs = task_pt_regs(p); fork_frame = container_of(childregs, struct fork_frame, regs); frame = &fork_frame->frame; + + /* + * For a new task use the RESET flags value since there is no before. + * All the status flags are zero; DF and all the system flags must also + * be 0, specifically IF must be 0 because we context switch to the new + * task with interrupts disabled. + */ + frame->flags = X86_EFLAGS_FIXED; frame->bp = 0; frame->ret_addr = (unsigned long) ret_from_fork; p->thread.sp = (unsigned long) fork_frame;
From: Yazen Ghannam yazen.ghannam@amd.com
commit 45d4b7b9cb88526f6d5bd4c03efab88d75d10e4f upstream.
Some systems may report spurious MCA errors. In general, spurious MCA errors may be disabled by clearing a particular bit in MCA_CTL. However, clearing a bit in MCA_CTL may not be recommended for some errors, so the only option is to ignore them.
An MCA error is printed and handled after it has been added to the MCE event pool. So an MCA error can be ignored by not adding it to that pool in the first place.
Add such a filtering function.
[ bp: Move function prototype to the internal header and massage. ]
Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Signed-off-by: Borislav Petkov bp@suse.de Cc: Arnd Bergmann arnd@arndb.de Cc: "clemej@gmail.com" clemej@gmail.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Pu Wen puwen@hygon.cn Cc: Qiuxu Zhuo qiuxu.zhuo@intel.com Cc: "rafal@milecki.pl" rafal@milecki.pl Cc: Shirish S Shirish.S@amd.com Cc: stable@vger.kernel.org # 5.0.x Cc: Thomas Gleixner tglx@linutronix.de Cc: Tony Luck tony.luck@intel.com Cc: Vishal Verma vishal.l.verma@intel.com Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/20190325163410.171021-1-Yazen.Ghannam@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/mce/core.c | 5 +++++ arch/x86/kernel/cpu/mce/genpool.c | 3 +++ arch/x86/kernel/cpu/mce/internal.h | 3 +++ 3 files changed, 11 insertions(+)
--- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1771,6 +1771,11 @@ static void __mcheck_cpu_init_timer(void mce_start_timer(t); }
+bool filter_mce(struct mce *m) +{ + return false; +} + /* Handle unconfigured int18 (should never happen) */ static void unexpected_machine_check(struct pt_regs *regs, long error_code) { --- a/arch/x86/kernel/cpu/mce/genpool.c +++ b/arch/x86/kernel/cpu/mce/genpool.c @@ -99,6 +99,9 @@ int mce_gen_pool_add(struct mce *mce) { struct mce_evt_llist *node;
+ if (filter_mce(mce)) + return -EINVAL; + if (!mce_evt_pool) return -EINVAL;
--- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -173,4 +173,7 @@ struct mca_msr_regs {
extern struct mca_msr_regs msr_ops;
+/* Decide whether to add MCE record to MCE event pool or filter it out. */ +extern bool filter_mce(struct mce *m); + #endif /* __X86_MCE_INTERNAL_H__ */
From: Yazen Ghannam yazen.ghannam@amd.com
commit 71a84402b93e5fbd8f817f40059c137e10171788 upstream.
AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA errors under certain conditions. The errors are benign and can safely be ignored. However, the high error rate may cause the MCA threshold counter to overflow causing a high rate of thresholding interrupts.
In addition, users may see the errors reported through the AMD MCE decoder module, even with the interrupt disabled, due to MCA polling.
Clear the "Counter Present" bit in the Instruction Fetch bank's MCA_MISC0 register. This will prevent enabling MCA thresholding on this bank which will prevent the high interrupt rate due to this error.
Define an AMD-specific function to filter these errors from the MCE event pool so that they don't get reported during early boot.
Rename filter function in EDAC/mce_amd to avoid a naming conflict, while at it.
[ bp: Move function prototype to the internal header and massage/cleanup, fix typos. ]
Reported-by: Rafał Miłecki rafal@milecki.pl Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Signed-off-by: Borislav Petkov bp@suse.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: "clemej@gmail.com" clemej@gmail.com Cc: Arnd Bergmann arnd@arndb.de Cc: Ingo Molnar mingo@redhat.com Cc: James Morse james.morse@arm.com Cc: Kees Cook keescook@chromium.org Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: Pu Wen puwen@hygon.cn Cc: Qiuxu Zhuo qiuxu.zhuo@intel.com Cc: Shirish S Shirish.S@amd.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Tony Luck tony.luck@intel.com Cc: Vishal Verma vishal.l.verma@intel.com Cc: linux-edac linux-edac@vger.kernel.org Cc: x86-ml x86@kernel.org Cc: stable@vger.kernel.org # 5.0.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models Cc: stable@vger.kernel.org # 5.0.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk Cc: stable@vger.kernel.org # 5.0.x: 9308fd407455: x86/MCE: Group AMD function prototypes in <asm/mce.h> Cc: stable@vger.kernel.org # 5.0.x Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/mce/amd.c | 52 +++++++++++++++++++++++++++---------- arch/x86/kernel/cpu/mce/core.c | 3 ++ arch/x86/kernel/cpu/mce/internal.h | 6 ++++ drivers/edac/mce_amd.c | 4 +- 4 files changed, 50 insertions(+), 15 deletions(-)
--- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -563,33 +563,59 @@ out: return offset; }
+bool amd_filter_mce(struct mce *m) +{ + enum smca_bank_types bank_type = smca_get_bank_type(m->bank); + struct cpuinfo_x86 *c = &boot_cpu_data; + u8 xec = (m->status >> 16) & 0x3F; + + /* See Family 17h Models 10h-2Fh Erratum #1114. */ + if (c->x86 == 0x17 && + c->x86_model >= 0x10 && c->x86_model <= 0x2F && + bank_type == SMCA_IF && xec == 10) + return true; + + return false; +} + /* - * Turn off MC4_MISC thresholding banks on all family 0x15 models since - * they're not supported there. + * Turn off thresholding banks for the following conditions: + * - MC4_MISC thresholding is not supported on Family 0x15. + * - Prevent possible spurious interrupts from the IF bank on Family 0x17 + * Models 0x10-0x2F due to Erratum #1114. */ -void disable_err_thresholding(struct cpuinfo_x86 *c) +void disable_err_thresholding(struct cpuinfo_x86 *c, unsigned int bank) { - int i; + int i, num_msrs; u64 hwcr; bool need_toggle; - u32 msrs[] = { - 0x00000413, /* MC4_MISC0 */ - 0xc0000408, /* MC4_MISC1 */ - }; + u32 msrs[NR_BLOCKS];
- if (c->x86 != 0x15) + if (c->x86 == 0x15 && bank == 4) { + msrs[0] = 0x00000413; /* MC4_MISC0 */ + msrs[1] = 0xc0000408; /* MC4_MISC1 */ + num_msrs = 2; + } else if (c->x86 == 0x17 && + (c->x86_model >= 0x10 && c->x86_model <= 0x2F)) { + + if (smca_get_bank_type(bank) != SMCA_IF) + return; + + msrs[0] = MSR_AMD64_SMCA_MCx_MISC(bank); + num_msrs = 1; + } else { return; + }
rdmsrl(MSR_K7_HWCR, hwcr);
/* McStatusWrEn has to be set */ need_toggle = !(hwcr & BIT(18)); - if (need_toggle) wrmsrl(MSR_K7_HWCR, hwcr | BIT(18));
/* Clear CntP bit safely */ - for (i = 0; i < ARRAY_SIZE(msrs); i++) + for (i = 0; i < num_msrs; i++) msr_clear_bit(msrs[i], 62);
/* restore old settings */ @@ -604,12 +630,12 @@ void mce_amd_feature_init(struct cpuinfo unsigned int bank, block, cpu = smp_processor_id(); int offset = -1;
- disable_err_thresholding(c); - for (bank = 0; bank < mca_cfg.banks; ++bank) { if (mce_flags.smca) smca_configure(bank, cpu);
+ disable_err_thresholding(c, bank); + for (block = 0; block < NR_BLOCKS; ++block) { address = get_block_address(address, low, high, bank, block); if (!address) --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1773,6 +1773,9 @@ static void __mcheck_cpu_init_timer(void
bool filter_mce(struct mce *m) { + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) + return amd_filter_mce(m); + return false; }
--- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -176,4 +176,10 @@ extern struct mca_msr_regs msr_ops; /* Decide whether to add MCE record to MCE event pool or filter it out. */ extern bool filter_mce(struct mce *m);
+#ifdef CONFIG_X86_MCE_AMD +extern bool amd_filter_mce(struct mce *m); +#else +static inline bool amd_filter_mce(struct mce *m) { return false; }; +#endif + #endif /* __X86_MCE_INTERNAL_H__ */ --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -1004,7 +1004,7 @@ static inline void amd_decode_err_code(u /* * Filter out unwanted MCE signatures here. */ -static bool amd_filter_mce(struct mce *m) +static bool ignore_mce(struct mce *m) { /* * NB GART TLB error reporting is disabled by default. @@ -1038,7 +1038,7 @@ amd_decode_mce(struct notifier_block *nb unsigned int fam = x86_family(m->cpuid); int ecc;
- if (amd_filter_mce(m)) + if (ignore_mce(m)) return NOTIFY_STOP;
pr_emerg(HW_ERR "%s\n", decode_error_status(m));
From: Christian Lamparter chunkeey@gmail.com
commit 25baaf8e2c93197d063b372ef7b62f2767c7ac0b upstream.
Commit 8efd972ef96a ("crypto: testmgr - support checking skcipher output IV") caused the crypto4xx driver to produce the following error:
| ctr-aes-ppc4xx encryption test failed (wrong output IV) | on test vector 0, cfg="in-place"
This patch fixes this by reworking the crypto4xx_setkey_aes() function to:
- not save the iv for ECB (as per 18.2.38 CRYP0_SA_CMD_0: "This bit mut be cleared for DES ECB mode or AES ECB mode, when no IV is used.")
- instruct the hardware to save the generated IV for all other modes of operations that have IV and then supply it back to the callee in pretty much the same way as we do it for cbc-aes already.
- make it clear that the DIR_(IN|OUT)BOUND is the important bit that tells the hardware to encrypt or decrypt the data. (this is cosmetic - but it hopefully prevents me from getting confused again).
- don't load any bogus hash when we don't use any hash operation to begin with.
Cc: stable@vger.kernel.org Fixes: f2a13e7cba9e ("crypto: crypto4xx - enable AES RFC3686, ECB, CFB and OFB offloads") Signed-off-by: Christian Lamparter chunkeey@gmail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/amcc/crypto4xx_alg.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/crypto/amcc/crypto4xx_alg.c +++ b/drivers/crypto/amcc/crypto4xx_alg.c @@ -141,9 +141,10 @@ static int crypto4xx_setkey_aes(struct c /* Setup SA */ sa = ctx->sa_in;
- set_dynamic_sa_command_0(sa, SA_NOT_SAVE_HASH, (cm == CRYPTO_MODE_CBC ? - SA_SAVE_IV : SA_NOT_SAVE_IV), - SA_LOAD_HASH_FROM_SA, SA_LOAD_IV_FROM_STATE, + set_dynamic_sa_command_0(sa, SA_NOT_SAVE_HASH, (cm == CRYPTO_MODE_ECB ? + SA_NOT_SAVE_IV : SA_SAVE_IV), + SA_NOT_LOAD_HASH, (cm == CRYPTO_MODE_ECB ? + SA_LOAD_IV_FROM_SA : SA_LOAD_IV_FROM_STATE), SA_NO_HEADER_PROC, SA_HASH_ALG_NULL, SA_CIPHER_ALG_AES, SA_PAD_TYPE_ZERO, SA_OP_GROUP_BASIC, SA_OPCODE_DECRYPT, @@ -162,6 +163,11 @@ static int crypto4xx_setkey_aes(struct c memcpy(ctx->sa_out, ctx->sa_in, ctx->sa_len * 4); sa = ctx->sa_out; sa->sa_command_0.bf.dir = DIR_OUTBOUND; + /* + * SA_OPCODE_ENCRYPT is the same value as SA_OPCODE_DECRYPT. + * it's the DIR_(IN|OUT)BOUND that matters + */ + sa->sa_command_0.bf.opcode = SA_OPCODE_ENCRYPT;
return 0; }
From: Christian Lamparter chunkeey@gmail.com
commit 7e92e1717e3eaf6b322c252947c696b3059f05be upstream.
Currently, crypto4xx CFB and OFB AES ciphers are failing testmgr's test vectors.
|cfb-aes-ppc4xx encryption overran dst buffer on test vector 3, cfg="in-place" |ofb-aes-ppc4xx encryption overran dst buffer on test vector 1, cfg="in-place"
This is because of a very subtile "bug" in the hardware that gets indirectly mentioned in 18.1.3.5 Encryption/Decryption of the hardware spec:
the OFB and CFB modes for AES are listed there as operation modes for >>> "Block ciphers" <<<. Which kind of makes sense, but we would like them to be considered as stream ciphers just like the CTR mode.
To workaround this issue and stop the hardware from causing "overran dst buffer" on crypttexts that are not a multiple of 16 (AES_BLOCK_SIZE), we force the driver to use the scatter buffers as the go-between.
As a bonus this patch also kills redundant pd_uinfo->num_gd and pd_uinfo->num_sd setters since the value has already been set before.
Cc: stable@vger.kernel.org Fixes: f2a13e7cba9e ("crypto: crypto4xx - enable AES RFC3686, ECB, CFB and OFB offloads") Signed-off-by: Christian Lamparter chunkeey@gmail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/amcc/crypto4xx_core.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-)
--- a/drivers/crypto/amcc/crypto4xx_core.c +++ b/drivers/crypto/amcc/crypto4xx_core.c @@ -714,7 +714,23 @@ int crypto4xx_build_pd(struct crypto_asy size_t offset_to_sr_ptr; u32 gd_idx = 0; int tmp; - bool is_busy; + bool is_busy, force_sd; + + /* + * There's a very subtile/disguised "bug" in the hardware that + * gets indirectly mentioned in 18.1.3.5 Encryption/Decryption + * of the hardware spec: + * *drum roll* the AES/(T)DES OFB and CFB modes are listed as + * operation modes for >>> "Block ciphers" <<<. + * + * To workaround this issue and stop the hardware from causing + * "overran dst buffer" on crypttexts that are not a multiple + * of 16 (AES_BLOCK_SIZE), we force the driver to use the + * scatter buffers. + */ + force_sd = (req_sa->sa_command_1.bf.crypto_mode9_8 == CRYPTO_MODE_CFB + || req_sa->sa_command_1.bf.crypto_mode9_8 == CRYPTO_MODE_OFB) + && (datalen % AES_BLOCK_SIZE);
/* figure how many gd are needed */ tmp = sg_nents_for_len(src, assoclen + datalen); @@ -732,7 +748,7 @@ int crypto4xx_build_pd(struct crypto_asy }
/* figure how many sd are needed */ - if (sg_is_last(dst)) { + if (sg_is_last(dst) && force_sd == false) { num_sd = 0; } else { if (datalen > PPC4XX_SD_BUFFER_SIZE) { @@ -807,9 +823,10 @@ int crypto4xx_build_pd(struct crypto_asy pd->sa_len = sa_len;
pd_uinfo = &dev->pdr_uinfo[pd_entry]; - pd_uinfo->async_req = req; pd_uinfo->num_gd = num_gd; pd_uinfo->num_sd = num_sd; + pd_uinfo->dest_va = dst; + pd_uinfo->async_req = req;
if (iv_len) memcpy(pd_uinfo->sr_va->save_iv, iv, iv_len); @@ -828,7 +845,6 @@ int crypto4xx_build_pd(struct crypto_asy /* get first gd we are going to use */ gd_idx = fst_gd; pd_uinfo->first_gd = fst_gd; - pd_uinfo->num_gd = num_gd; gd = crypto4xx_get_gdp(dev, &gd_dma, gd_idx); pd->src = gd_dma; /* enable gather */ @@ -865,17 +881,14 @@ int crypto4xx_build_pd(struct crypto_asy * Indicate gather array is not used */ pd_uinfo->first_gd = 0xffffffff; - pd_uinfo->num_gd = 0; } - if (sg_is_last(dst)) { + if (!num_sd) { /* * we know application give us dst a whole piece of memory * no need to use scatter ring. */ pd_uinfo->using_sd = 0; pd_uinfo->first_sd = 0xffffffff; - pd_uinfo->num_sd = 0; - pd_uinfo->dest_va = dst; sa->sa_command_0.bf.scatter = 0; pd->dest = (u32)dma_map_page(dev->core_dev->device, sg_page(dst), dst->offset, @@ -889,9 +902,7 @@ int crypto4xx_build_pd(struct crypto_asy nbytes = datalen; sa->sa_command_0.bf.scatter = 1; pd_uinfo->using_sd = 1; - pd_uinfo->dest_va = dst; pd_uinfo->first_sd = fst_sd; - pd_uinfo->num_sd = num_sd; sd = crypto4xx_get_sdp(dev, &sd_dma, sd_idx); pd->dest = sd_dma; /* setup scatter descriptor */
From: Eric Biggers ebiggers@google.com
commit edaf28e996af69222b2cb40455dbb5459c2b875a upstream.
If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free.
salsa20-generic doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and it was actually broken prior to the alignmask being removed by commit b62b3db76f73 ("crypto: salsa20-generic - cleanup and convert to skcipher API").
Since salsa20-generic does not update the IV and does not need any IV alignment, update it to use req->iv instead of walk.iv.
Fixes: 2407d60872dd ("[CRYPTO] salsa20: Salsa20 stream cipher") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/salsa20_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/crypto/salsa20_generic.c +++ b/crypto/salsa20_generic.c @@ -161,7 +161,7 @@ static int salsa20_crypt(struct skcipher
err = skcipher_walk_virt(&walk, req, false);
- salsa20_init(state, ctx, walk.iv); + salsa20_init(state, ctx, req->iv);
while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes;
From: Eric Biggers ebiggers@google.com
commit aec286cd36eacfd797e3d5dab8d5d23c15d1bb5e upstream.
If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free.
Fix this in the LRW template by checking the return value of skcipher_walk_virt().
This bug was detected by my patches that improve testmgr to fuzz algorithms against their generic implementation. When the extra self-tests were run on a KASAN-enabled kernel, a KASAN use-after-free splat occured during lrw(aes) testing.
Fixes: c778f96bf347 ("crypto: lrw - Optimize tweak computation") Cc: stable@vger.kernel.org # v4.20+ Cc: Ondrej Mosnacek omosnace@redhat.com Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/lrw.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/crypto/lrw.c +++ b/crypto/lrw.c @@ -162,8 +162,10 @@ static int xor_tweak(struct skcipher_req }
err = skcipher_walk_virt(&w, req, false); - iv = (__be32 *)w.iv; + if (err) + return err;
+ iv = (__be32 *)w.iv; counter[0] = be32_to_cpu(iv[3]); counter[1] = be32_to_cpu(iv[2]); counter[2] = be32_to_cpu(iv[1]);
From: Eric Biggers ebiggers@google.com
commit 7aceaaef04eaaf6019ca159bc354d800559bba1d upstream.
The arm64 implementations of ChaCha and XChaCha are failing the extra crypto self-tests following my patches to test the !may_use_simd() code paths, which previously were untested. The problem is as follows:
When !may_use_simd(), the arm64 NEON implementations fall back to the generic implementation, which uses the skcipher_walk API to iterate through the src/dst scatterlists. Due to how the skcipher_walk API works, walk.stride is set from the skcipher_alg actually being used, which in this case is the arm64 NEON algorithm. Thus walk.stride is 5*CHACHA_BLOCK_SIZE, not CHACHA_BLOCK_SIZE.
This unnecessarily large stride shouldn't cause an actual problem. However, the generic implementation computes round_down(nbytes, walk.stride). round_down() assumes the round amount is a power of 2, which 5*CHACHA_BLOCK_SIZE is not, so it gives the wrong result.
This causes the following case in skcipher_walk_done() to be hit, causing a WARN() and failing the encryption operation:
if (WARN_ON(err)) { /* unexpected case; didn't process all bytes */ err = -EINVAL; goto finish; }
Fix it by rounding down to CHACHA_BLOCK_SIZE instead of walk.stride.
(Or we could replace round_down() with rounddown(), but that would add a slow division operation every time, which I think we should avoid.)
Fixes: 2fe55987b262 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/chacha_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/crypto/chacha_generic.c +++ b/crypto/chacha_generic.c @@ -52,7 +52,7 @@ static int chacha_stream_xor(struct skci unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total) - nbytes = round_down(nbytes, walk.stride); + nbytes = round_down(nbytes, CHACHA_BLOCK_SIZE);
chacha_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr, nbytes, ctx->nrounds);
From: Eric Biggers ebiggers@google.com
commit 5e27f38f1f3f45a0c938299c3a34a2d2db77165a upstream.
If the rfc7539 template is instantiated with specific implementations, e.g. "rfc7539(chacha20-generic,poly1305-generic)" rather than "rfc7539(chacha20,poly1305)", then the implementation names end up included in the instance's cra_name. This is incorrect because it then prevents all users from allocating "rfc7539(chacha20,poly1305)", if the highest priority implementations of chacha20 and poly1305 were selected. Also, the self-tests aren't run on an instance allocated in this way.
Fix it by setting the instance's cra_name from the underlying algorithms' actual cra_names, rather than from the requested names. This matches what other templates do.
Fixes: 71ebc4d1b27d ("crypto: chacha20poly1305 - Add a ChaCha20-Poly1305 AEAD construction, RFC7539") Cc: stable@vger.kernel.org # v4.2+ Cc: Martin Willi martin@strongswan.org Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Martin Willi martin@strongswan.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/chacha20poly1305.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/crypto/chacha20poly1305.c +++ b/crypto/chacha20poly1305.c @@ -645,8 +645,8 @@ static int chachapoly_create(struct cryp
err = -ENAMETOOLONG; if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, - "%s(%s,%s)", name, chacha_name, - poly_name) >= CRYPTO_MAX_ALG_NAME) + "%s(%s,%s)", name, chacha->base.cra_name, + poly->cra_name) >= CRYPTO_MAX_ALG_NAME) goto out_drop_chacha; if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s(%s,%s)", name, chacha->base.cra_driver_name,
From: Eric Biggers ebiggers@google.com
commit 6a1faa4a43f5fabf9cbeaa742d916e7b5e73120f upstream.
CCM instances can be created by either the "ccm" template, which only allows choosing the block cipher, e.g. "ccm(aes)"; or by "ccm_base", which allows choosing the ctr and cbcmac implementations, e.g. "ccm_base(ctr(aes-generic),cbcmac(aes-generic))".
However, a "ccm_base" instance prevents a "ccm" instance from being registered using the same implementations. Nor will the instance be found by lookups of "ccm". This can be used as a denial of service. Moreover, "ccm_base" instances are never tested by the crypto self-tests, even if there are compatible "ccm" tests.
The root cause of these problems is that instances of the two templates use different cra_names. Therefore, fix these problems by making "ccm_base" instances set the same cra_name as "ccm" instances, e.g. "ccm(aes)" instead of "ccm_base(ctr(aes-generic),cbcmac(aes-generic))".
This requires extracting the block cipher name from the name of the ctr and cbcmac algorithms. It also requires starting to verify that the algorithms are really ctr and cbcmac using the same block cipher, not something else entirely. But it would be bizarre if anyone were actually using non-ccm-compatible algorithms with ccm_base, so this shouldn't break anyone in practice.
Fixes: 4a49b499dfa0 ("[CRYPTO] ccm: Added CCM mode") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/ccm.c | 44 ++++++++++++++++++-------------------------- 1 file changed, 18 insertions(+), 26 deletions(-)
--- a/crypto/ccm.c +++ b/crypto/ccm.c @@ -458,7 +458,6 @@ static void crypto_ccm_free(struct aead_
static int crypto_ccm_create_common(struct crypto_template *tmpl, struct rtattr **tb, - const char *full_name, const char *ctr_name, const char *mac_name) { @@ -486,7 +485,8 @@ static int crypto_ccm_create_common(stru
mac = __crypto_hash_alg_common(mac_alg); err = -EINVAL; - if (mac->digestsize != 16) + if (strncmp(mac->base.cra_name, "cbcmac(", 7) != 0 || + mac->digestsize != 16) goto out_put_mac;
inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL); @@ -509,23 +509,27 @@ static int crypto_ccm_create_common(stru
ctr = crypto_spawn_skcipher_alg(&ictx->ctr);
- /* Not a stream cipher? */ + /* The skcipher algorithm must be CTR mode, using 16-byte blocks. */ err = -EINVAL; - if (ctr->base.cra_blocksize != 1) + if (strncmp(ctr->base.cra_name, "ctr(", 4) != 0 || + crypto_skcipher_alg_ivsize(ctr) != 16 || + ctr->base.cra_blocksize != 1) goto err_drop_ctr;
- /* We want the real thing! */ - if (crypto_skcipher_alg_ivsize(ctr) != 16) + /* ctr and cbcmac must use the same underlying block cipher. */ + if (strcmp(ctr->base.cra_name + 4, mac->base.cra_name + 7) != 0) goto err_drop_ctr;
err = -ENAMETOOLONG; + if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, + "ccm(%s", ctr->base.cra_name + 4) >= CRYPTO_MAX_ALG_NAME) + goto err_drop_ctr; + if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME, "ccm_base(%s,%s)", ctr->base.cra_driver_name, mac->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME) goto err_drop_ctr;
- memcpy(inst->alg.base.cra_name, full_name, CRYPTO_MAX_ALG_NAME); - inst->alg.base.cra_flags = ctr->base.cra_flags & CRYPTO_ALG_ASYNC; inst->alg.base.cra_priority = (mac->base.cra_priority + ctr->base.cra_priority) / 2; @@ -567,7 +571,6 @@ static int crypto_ccm_create(struct cryp const char *cipher_name; char ctr_name[CRYPTO_MAX_ALG_NAME]; char mac_name[CRYPTO_MAX_ALG_NAME]; - char full_name[CRYPTO_MAX_ALG_NAME];
cipher_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(cipher_name)) @@ -581,35 +584,24 @@ static int crypto_ccm_create(struct cryp cipher_name) >= CRYPTO_MAX_ALG_NAME) return -ENAMETOOLONG;
- if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "ccm(%s)", cipher_name) >= - CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; - - return crypto_ccm_create_common(tmpl, tb, full_name, ctr_name, - mac_name); + return crypto_ccm_create_common(tmpl, tb, ctr_name, mac_name); }
static int crypto_ccm_base_create(struct crypto_template *tmpl, struct rtattr **tb) { const char *ctr_name; - const char *cipher_name; - char full_name[CRYPTO_MAX_ALG_NAME]; + const char *mac_name;
ctr_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(ctr_name)) return PTR_ERR(ctr_name);
- cipher_name = crypto_attr_alg_name(tb[2]); - if (IS_ERR(cipher_name)) - return PTR_ERR(cipher_name); - - if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "ccm_base(%s,%s)", - ctr_name, cipher_name) >= CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; + mac_name = crypto_attr_alg_name(tb[2]); + if (IS_ERR(mac_name)) + return PTR_ERR(mac_name);
- return crypto_ccm_create_common(tmpl, tb, full_name, ctr_name, - cipher_name); + return crypto_ccm_create_common(tmpl, tb, ctr_name, mac_name); }
static int crypto_rfc4309_setkey(struct crypto_aead *parent, const u8 *key,
From: Singh, Brijesh brijesh.singh@amd.com
commit f5a2aeb8b254c764772729a6e48d4e0c914bb56a upstream.
Currently, we free the psp_master if the PLATFORM_INIT fails during the SEV FW probe. If psp_master is freed then driver does not invoke the PSP FW. As per SEV FW spec, there are several commands (PLATFORM_RESET, PLATFORM_STATUS, GET_ID etc) which can be executed in the UNINIT state We should not free the psp_master when PLATFORM_INIT fails.
Fixes: 200664d5237f ("crypto: ccp: Add SEV support") Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Gary Hook gary.hook@amd.com Cc: stable@vger.kernel.org # 4.19.y Signed-off-by: Brijesh Singh brijesh.singh@amd.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccp/psp-dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -997,7 +997,7 @@ void psp_pci_init(void) rc = sev_platform_init(&error); if (rc) { dev_err(sp->dev, "SEV: failed to INIT error %#x\n", error); - goto err; + return; }
dev_info(sp->dev, "SEV API:%d.%d build:%d\n", psp_master->api_major,
From: Daniel Axtens dja@axtens.net
commit dcf7b48212c0fab7df69e84fab22d6cb7c8c0fb9 upstream.
The original assembly imported from OpenSSL has two copy-paste errors in handling CTR mode. When dealing with a 2 or 3 block tail, the code branches to the CBC decryption exit path, rather than to the CTR exit path.
This leads to corruption of the IV, which leads to subsequent blocks being corrupted.
This can be detected with libkcapi test suite, which is available at https://github.com/smuellerDD/libkcapi
Reported-by: Ondrej Mosnáček omosnacek@gmail.com Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") Cc: stable@vger.kernel.org Signed-off-by: Daniel Axtens dja@axtens.net Tested-by: Michael Ellerman mpe@ellerman.id.au Tested-by: Ondrej Mosnacek omosnacek@gmail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/vmx/aesp8-ppc.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/crypto/vmx/aesp8-ppc.pl +++ b/drivers/crypto/vmx/aesp8-ppc.pl @@ -1854,7 +1854,7 @@ Lctr32_enc8x_three: stvx_u $out1,$x10,$out stvx_u $out2,$x20,$out addi $out,$out,0x30 - b Lcbc_dec8x_done + b Lctr32_enc8x_done
.align 5 Lctr32_enc8x_two: @@ -1866,7 +1866,7 @@ Lctr32_enc8x_two: stvx_u $out0,$x00,$out stvx_u $out1,$x10,$out addi $out,$out,0x20 - b Lcbc_dec8x_done + b Lctr32_enc8x_done
.align 5 Lctr32_enc8x_one:
From: Eric Biggers ebiggers@google.com
commit dcaca01a42cc2c425154a13412b4124293a6e11e upstream.
skcipher_walk_done() assumes it's a bug if, after the "slow" path is executed where the next chunk of data is processed via a bounce buffer, the algorithm says it didn't process all bytes. Thus it WARNs on this.
However, this can happen legitimately when the message needs to be evenly divisible into "blocks" but isn't, and the algorithm has a 'walksize' greater than the block size. For example, ecb-aes-neonbs sets 'walksize' to 128 bytes and only supports messages evenly divisible into 16-byte blocks. If, say, 17 message bytes remain but they straddle scatterlist elements, the skcipher_walk code will take the "slow" path and pass the algorithm all 17 bytes in the bounce buffer. But the algorithm will only be able to process 16 bytes, triggering the WARN.
Fix this by just removing the WARN_ON(). Returning -EINVAL, as the code already does, is the right behavior.
This bug was detected by my patches that improve testmgr to fuzz algorithms against their generic implementation.
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/skcipher.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -131,8 +131,13 @@ unmap_src: memcpy(walk->dst.virt.addr, walk->page, n); skcipher_unmap_dst(walk); } else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) { - if (WARN_ON(err)) { - /* unexpected case; didn't process all bytes */ + if (err) { + /* + * Didn't process all bytes. Either the algorithm is + * broken, or this was the last step and it turned out + * the message wasn't evenly divisible into blocks but + * the algorithm requires it. + */ err = -EINVAL; goto finish; }
From: Eric Biggers ebiggers@google.com
commit 307508d1072979f4435416f87936f87eaeb82054 upstream.
The ->digest() method of crct10dif-generic reads the current CRC value from the shash_desc context. But this value is uninitialized, causing crypto_shash_digest() to compute the wrong result. Fix it.
Probably this wasn't noticed before because lib/crc-t10dif.c only uses crypto_shash_update(), not crypto_shash_digest(). Likewise, crypto_shash_digest() is not yet tested by the crypto self-tests because those only test the ahash API which only uses shash init/update/final.
This bug was detected by my patches that improve testmgr to fuzz algorithms against their generic implementation.
Fixes: 2d31e518a428 ("crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework") Cc: stable@vger.kernel.org # v3.11+ Cc: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/crct10dif_generic.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
--- a/crypto/crct10dif_generic.c +++ b/crypto/crct10dif_generic.c @@ -65,10 +65,9 @@ static int chksum_final(struct shash_des return 0; }
-static int __chksum_finup(__u16 *crcp, const u8 *data, unsigned int len, - u8 *out) +static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out) { - *(__u16 *)out = crc_t10dif_generic(*crcp, data, len); + *(__u16 *)out = crc_t10dif_generic(crc, data, len); return 0; }
@@ -77,15 +76,13 @@ static int chksum_finup(struct shash_des { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
- return __chksum_finup(&ctx->crc, data, len, out); + return __chksum_finup(ctx->crc, data, len, out); }
static int chksum_digest(struct shash_desc *desc, const u8 *data, unsigned int length, u8 *out) { - struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); - - return __chksum_finup(&ctx->crc, data, length, out); + return __chksum_finup(0, data, length, out); }
static struct shash_alg alg = {
From: Eric Biggers ebiggers@google.com
commit dec3d0b1071a0f3194e66a83d26ecf4aa8c5910e upstream.
The ->digest() method of crct10dif-pclmul reads the current CRC value from the shash_desc context. But this value is uninitialized, causing crypto_shash_digest() to compute the wrong result. Fix it.
Probably this wasn't noticed before because lib/crc-t10dif.c only uses crypto_shash_update(), not crypto_shash_digest(). Likewise, crypto_shash_digest() is not yet tested by the crypto self-tests because those only test the ahash API which only uses shash init/update/final.
Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform") Cc: stable@vger.kernel.org # v3.11+ Cc: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/crypto/crct10dif-pclmul_glue.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c +++ b/arch/x86/crypto/crct10dif-pclmul_glue.c @@ -70,15 +70,14 @@ static int chksum_final(struct shash_des return 0; }
-static int __chksum_finup(__u16 *crcp, const u8 *data, unsigned int len, - u8 *out) +static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out) { if (len >= 16 && irq_fpu_usable()) { kernel_fpu_begin(); - *(__u16 *)out = crc_t10dif_pcl(*crcp, data, len); + *(__u16 *)out = crc_t10dif_pcl(crc, data, len); kernel_fpu_end(); } else - *(__u16 *)out = crc_t10dif_generic(*crcp, data, len); + *(__u16 *)out = crc_t10dif_generic(crc, data, len); return 0; }
@@ -87,15 +86,13 @@ static int chksum_finup(struct shash_des { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
- return __chksum_finup(&ctx->crc, data, len, out); + return __chksum_finup(ctx->crc, data, len, out); }
static int chksum_digest(struct shash_desc *desc, const u8 *data, unsigned int length, u8 *out) { - struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); - - return __chksum_finup(&ctx->crc, data, length, out); + return __chksum_finup(0, data, length, out); }
static struct shash_alg alg = {
From: Eric Biggers ebiggers@google.com
commit 580e295178402d14bbf598a5702f8e01fc59dbaa upstream.
The arm64 gcm-aes-ce algorithm is failing the extra crypto self-tests following my patches to test the !may_use_simd() code paths, which previously were untested. The problem is that in the !may_use_simd() case, an odd number of AES blocks can be processed within each step of the skcipher_walk. However, the skcipher_walk is being done with a "stride" of 2 blocks and is advanced by an even number of blocks after each step. This causes the encryption to produce the wrong ciphertext and authentication tag, and causes the decryption to incorrectly fail.
Fix it by only processing an even number of blocks per step.
Fixes: c2b24c36e0a3 ("crypto: arm64/aes-gcm-ce - fix scatterwalk API violation") Fixes: 71e52c278c54 ("crypto: arm64/aes-ce-gcm - operate on two input blocks at a time") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/crypto/ghash-ce-glue.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -473,9 +473,11 @@ static int gcm_encrypt(struct aead_reque put_unaligned_be32(2, iv + GCM_IV_SIZE);
while (walk.nbytes >= (2 * AES_BLOCK_SIZE)) { - int blocks = walk.nbytes / AES_BLOCK_SIZE; + const int blocks = + walk.nbytes / (2 * AES_BLOCK_SIZE) * 2; u8 *dst = walk.dst.virt.addr; u8 *src = walk.src.virt.addr; + int remaining = blocks;
do { __aes_arm64_encrypt(ctx->aes_key.key_enc, @@ -485,9 +487,9 @@ static int gcm_encrypt(struct aead_reque
dst += AES_BLOCK_SIZE; src += AES_BLOCK_SIZE; - } while (--blocks > 0); + } while (--remaining > 0);
- ghash_do_update(walk.nbytes / AES_BLOCK_SIZE, dg, + ghash_do_update(blocks, dg, walk.dst.virt.addr, &ctx->ghash_key, NULL, pmull_ghash_update_p64);
@@ -609,7 +611,7 @@ static int gcm_decrypt(struct aead_reque put_unaligned_be32(2, iv + GCM_IV_SIZE);
while (walk.nbytes >= (2 * AES_BLOCK_SIZE)) { - int blocks = walk.nbytes / AES_BLOCK_SIZE; + int blocks = walk.nbytes / (2 * AES_BLOCK_SIZE) * 2; u8 *dst = walk.dst.virt.addr; u8 *src = walk.src.virt.addr;
From: Eric Biggers ebiggers@google.com
commit f699594d436960160f6d5ba84ed4a222f20d11cd upstream.
GCM instances can be created by either the "gcm" template, which only allows choosing the block cipher, e.g. "gcm(aes)"; or by "gcm_base", which allows choosing the ctr and ghash implementations, e.g. "gcm_base(ctr(aes-generic),ghash-generic)".
However, a "gcm_base" instance prevents a "gcm" instance from being registered using the same implementations. Nor will the instance be found by lookups of "gcm". This can be used as a denial of service. Moreover, "gcm_base" instances are never tested by the crypto self-tests, even if there are compatible "gcm" tests.
The root cause of these problems is that instances of the two templates use different cra_names. Therefore, fix these problems by making "gcm_base" instances set the same cra_name as "gcm" instances, e.g. "gcm(aes)" instead of "gcm_base(ctr(aes-generic),ghash-generic)".
This requires extracting the block cipher name from the name of the ctr algorithm. It also requires starting to verify that the algorithms are really ctr and ghash, not something else entirely. But it would be bizarre if anyone were actually using non-gcm-compatible algorithms with gcm_base, so this shouldn't break anyone in practice.
Fixes: d00aa19b507b ("[CRYPTO] gcm: Allow block cipher parameter") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/gcm.c | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-)
--- a/crypto/gcm.c +++ b/crypto/gcm.c @@ -597,7 +597,6 @@ static void crypto_gcm_free(struct aead_
static int crypto_gcm_create_common(struct crypto_template *tmpl, struct rtattr **tb, - const char *full_name, const char *ctr_name, const char *ghash_name) { @@ -638,7 +637,8 @@ static int crypto_gcm_create_common(stru goto err_free_inst;
err = -EINVAL; - if (ghash->digestsize != 16) + if (strcmp(ghash->base.cra_name, "ghash") != 0 || + ghash->digestsize != 16) goto err_drop_ghash;
crypto_set_skcipher_spawn(&ctx->ctr, aead_crypto_instance(inst)); @@ -650,24 +650,24 @@ static int crypto_gcm_create_common(stru
ctr = crypto_spawn_skcipher_alg(&ctx->ctr);
- /* We only support 16-byte blocks. */ + /* The skcipher algorithm must be CTR mode, using 16-byte blocks. */ err = -EINVAL; - if (crypto_skcipher_alg_ivsize(ctr) != 16) + if (strncmp(ctr->base.cra_name, "ctr(", 4) != 0 || + crypto_skcipher_alg_ivsize(ctr) != 16 || + ctr->base.cra_blocksize != 1) goto out_put_ctr;
- /* Not a stream cipher? */ - if (ctr->base.cra_blocksize != 1) + err = -ENAMETOOLONG; + if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, + "gcm(%s", ctr->base.cra_name + 4) >= CRYPTO_MAX_ALG_NAME) goto out_put_ctr;
- err = -ENAMETOOLONG; if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME, "gcm_base(%s,%s)", ctr->base.cra_driver_name, ghash_alg->cra_driver_name) >= CRYPTO_MAX_ALG_NAME) goto out_put_ctr;
- memcpy(inst->alg.base.cra_name, full_name, CRYPTO_MAX_ALG_NAME); - inst->alg.base.cra_flags = (ghash->base.cra_flags | ctr->base.cra_flags) & CRYPTO_ALG_ASYNC; inst->alg.base.cra_priority = (ghash->base.cra_priority + @@ -709,7 +709,6 @@ static int crypto_gcm_create(struct cryp { const char *cipher_name; char ctr_name[CRYPTO_MAX_ALG_NAME]; - char full_name[CRYPTO_MAX_ALG_NAME];
cipher_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(cipher_name)) @@ -719,12 +718,7 @@ static int crypto_gcm_create(struct cryp CRYPTO_MAX_ALG_NAME) return -ENAMETOOLONG;
- if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "gcm(%s)", cipher_name) >= - CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; - - return crypto_gcm_create_common(tmpl, tb, full_name, - ctr_name, "ghash"); + return crypto_gcm_create_common(tmpl, tb, ctr_name, "ghash"); }
static int crypto_gcm_base_create(struct crypto_template *tmpl, @@ -732,7 +726,6 @@ static int crypto_gcm_base_create(struct { const char *ctr_name; const char *ghash_name; - char full_name[CRYPTO_MAX_ALG_NAME];
ctr_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(ctr_name)) @@ -742,12 +735,7 @@ static int crypto_gcm_base_create(struct if (IS_ERR(ghash_name)) return PTR_ERR(ghash_name);
- if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "gcm_base(%s,%s)", - ctr_name, ghash_name) >= CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; - - return crypto_gcm_create_common(tmpl, tb, full_name, - ctr_name, ghash_name); + return crypto_gcm_create_common(tmpl, tb, ctr_name, ghash_name); }
static int crypto_rfc4106_setkey(struct crypto_aead *parent, const u8 *key,
From: Zhang Zhijie zhangzj@rock-chips.com
commit f0cfd57b43fec65761ca61d3892b983a71515f23 upstream.
The Kernel Crypto API request output the next IV data to IV buffer for CBC implementation. So the last block data of ciphertext should be copid into assigned IV buffer.
Reported-by: Eric Biggers ebiggers@google.com Fixes: 433cd2c617bf ("crypto: rockchip - add crypto driver for rk3288") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Zhang Zhijie zhangzj@rock-chips.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c | 25 +++++++++++++++------ 1 file changed, 18 insertions(+), 7 deletions(-)
--- a/drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c +++ b/drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c @@ -250,9 +250,14 @@ static int rk_set_data_start(struct rk_c u8 *src_last_blk = page_address(sg_page(dev->sg_src)) + dev->sg_src->offset + dev->sg_src->length - ivsize;
- /* store the iv that need to be updated in chain mode */ - if (ctx->mode & RK_CRYPTO_DEC) + /* Store the iv that need to be updated in chain mode. + * And update the IV buffer to contain the next IV for decryption mode. + */ + if (ctx->mode & RK_CRYPTO_DEC) { memcpy(ctx->iv, src_last_blk, ivsize); + sg_pcopy_to_buffer(dev->first, dev->src_nents, req->info, + ivsize, dev->total - ivsize); + }
err = dev->load_data(dev, dev->sg_src, dev->sg_dst); if (!err) @@ -288,13 +293,19 @@ static void rk_iv_copyback(struct rk_cry struct ablkcipher_request *req = ablkcipher_request_cast(dev->async_req); struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct rk_cipher_ctx *ctx = crypto_ablkcipher_ctx(tfm); u32 ivsize = crypto_ablkcipher_ivsize(tfm);
- if (ivsize == DES_BLOCK_SIZE) - memcpy_fromio(req->info, dev->reg + RK_CRYPTO_TDES_IV_0, - ivsize); - else if (ivsize == AES_BLOCK_SIZE) - memcpy_fromio(req->info, dev->reg + RK_CRYPTO_AES_IV_0, ivsize); + /* Update the IV buffer to contain the next IV for encryption mode. */ + if (!(ctx->mode & RK_CRYPTO_DEC)) { + if (dev->aligned) { + memcpy(req->info, sg_virt(dev->sg_dst) + + dev->sg_dst->length - ivsize, ivsize); + } else { + memcpy(req->info, dev->addr_vir + + dev->count - ivsize, ivsize); + } + } }
static void rk_update_iv(struct rk_crypto_info *dev)
From: Horia Geantă horia.geanta@nxp.com
commit 07586d3ddf284dd7a1a6579579d8efa7296fe60f upstream.
Commit 04e6d25c5bb2 ("crypto: caam - fix zero-length buffer DMA mapping") fixed an issue in caam/jr driver where ahash implementation was DMA mapping a zero-length buffer.
Current commit applies a similar fix for caam/qi2 driver.
Cc: stable@vger.kernel.org # v4.20+ Fixes: 3f16f6c9d632 ("crypto: caam/qi2 - add support for ahash algorithms") Signed-off-by: Horia Geantă horia.geanta@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/caam/caamalg_qi2.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-)
--- a/drivers/crypto/caam/caamalg_qi2.c +++ b/drivers/crypto/caam/caamalg_qi2.c @@ -3755,10 +3755,13 @@ static int ahash_final_no_ctx(struct aha if (!edesc) return ret;
- state->buf_dma = dma_map_single(ctx->dev, buf, buflen, DMA_TO_DEVICE); - if (dma_mapping_error(ctx->dev, state->buf_dma)) { - dev_err(ctx->dev, "unable to map src\n"); - goto unmap; + if (buflen) { + state->buf_dma = dma_map_single(ctx->dev, buf, buflen, + DMA_TO_DEVICE); + if (dma_mapping_error(ctx->dev, state->buf_dma)) { + dev_err(ctx->dev, "unable to map src\n"); + goto unmap; + } }
edesc->dst_dma = dma_map_single(ctx->dev, req->result, digestsize, @@ -3771,9 +3774,17 @@ static int ahash_final_no_ctx(struct aha
memset(&req_ctx->fd_flt, 0, sizeof(req_ctx->fd_flt)); dpaa2_fl_set_final(in_fle, true); - dpaa2_fl_set_format(in_fle, dpaa2_fl_single); - dpaa2_fl_set_addr(in_fle, state->buf_dma); - dpaa2_fl_set_len(in_fle, buflen); + /* + * crypto engine requires the input entry to be present when + * "frame list" FD is used. + * Since engine does not support FMT=2'b11 (unused entry type), leaving + * in_fle zeroized (except for "Final" flag) is the best option. + */ + if (buflen) { + dpaa2_fl_set_format(in_fle, dpaa2_fl_single); + dpaa2_fl_set_addr(in_fle, state->buf_dma); + dpaa2_fl_set_len(in_fle, buflen); + } dpaa2_fl_set_format(out_fle, dpaa2_fl_single); dpaa2_fl_set_addr(out_fle, edesc->dst_dma); dpaa2_fl_set_len(out_fle, digestsize);
From: Horia Geantă horia.geanta@nxp.com
commit 5965dc745287bebf7a2eba91a66f017537fa4c54 upstream.
Commits c19650d6ea99 ("crypto: caam - fix DMA mapping of stack memory") and 65055e210884 ("crypto: caam - fix hash context DMA unmap size") fixed the ahash implementation in caam/jr driver such that req->result is not DMA-mapped (since it's not guaranteed to be DMA-able).
Apply a similar fix for ahash implementation in caam/qi2 driver.
Cc: stable@vger.kernel.org # v4.20+ Fixes: 3f16f6c9d632 ("crypto: caam/qi2 - add support for ahash algorithms") Signed-off-by: Horia Geantă horia.geanta@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/caam/caamalg_qi2.c | 111 +++++++++++++++----------------------- drivers/crypto/caam/caamalg_qi2.h | 2 2 files changed, 45 insertions(+), 68 deletions(-)
--- a/drivers/crypto/caam/caamalg_qi2.c +++ b/drivers/crypto/caam/caamalg_qi2.c @@ -2854,6 +2854,7 @@ struct caam_hash_state { struct caam_request caam_req; dma_addr_t buf_dma; dma_addr_t ctx_dma; + int ctx_dma_len; u8 buf_0[CAAM_MAX_HASH_BLOCK_SIZE] ____cacheline_aligned; int buflen_0; u8 buf_1[CAAM_MAX_HASH_BLOCK_SIZE] ____cacheline_aligned; @@ -2927,6 +2928,7 @@ static inline int ctx_map_to_qm_sg(struc struct caam_hash_state *state, int ctx_len, struct dpaa2_sg_entry *qm_sg, u32 flag) { + state->ctx_dma_len = ctx_len; state->ctx_dma = dma_map_single(dev, state->caam_ctx, ctx_len, flag); if (dma_mapping_error(dev, state->ctx_dma)) { dev_err(dev, "unable to map ctx\n"); @@ -3165,14 +3167,12 @@ bad_free_key: }
static inline void ahash_unmap(struct device *dev, struct ahash_edesc *edesc, - struct ahash_request *req, int dst_len) + struct ahash_request *req) { struct caam_hash_state *state = ahash_request_ctx(req);
if (edesc->src_nents) dma_unmap_sg(dev, req->src, edesc->src_nents, DMA_TO_DEVICE); - if (edesc->dst_dma) - dma_unmap_single(dev, edesc->dst_dma, dst_len, DMA_FROM_DEVICE);
if (edesc->qm_sg_bytes) dma_unmap_single(dev, edesc->qm_sg_dma, edesc->qm_sg_bytes, @@ -3187,18 +3187,15 @@ static inline void ahash_unmap(struct de
static inline void ahash_unmap_ctx(struct device *dev, struct ahash_edesc *edesc, - struct ahash_request *req, int dst_len, - u32 flag) + struct ahash_request *req, u32 flag) { - struct crypto_ahash *ahash = crypto_ahash_reqtfm(req); - struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash); struct caam_hash_state *state = ahash_request_ctx(req);
if (state->ctx_dma) { - dma_unmap_single(dev, state->ctx_dma, ctx->ctx_len, flag); + dma_unmap_single(dev, state->ctx_dma, state->ctx_dma_len, flag); state->ctx_dma = 0; } - ahash_unmap(dev, edesc, req, dst_len); + ahash_unmap(dev, edesc, req); }
static void ahash_done(void *cbk_ctx, u32 status) @@ -3219,16 +3216,13 @@ static void ahash_done(void *cbk_ctx, u3 ecode = -EIO; }
- ahash_unmap(ctx->dev, edesc, req, digestsize); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_FROM_DEVICE); + memcpy(req->result, state->caam_ctx, digestsize); qi_cache_free(edesc);
print_hex_dump_debug("ctx@" __stringify(__LINE__)": ", DUMP_PREFIX_ADDRESS, 16, 4, state->caam_ctx, ctx->ctx_len, 1); - if (req->result) - print_hex_dump_debug("result@" __stringify(__LINE__)": ", - DUMP_PREFIX_ADDRESS, 16, 4, req->result, - digestsize, 1);
req->base.complete(&req->base, ecode); } @@ -3250,7 +3244,7 @@ static void ahash_done_bi(void *cbk_ctx, ecode = -EIO; }
- ahash_unmap_ctx(ctx->dev, edesc, req, ctx->ctx_len, DMA_BIDIRECTIONAL); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_BIDIRECTIONAL); switch_buf(state); qi_cache_free(edesc);
@@ -3283,16 +3277,13 @@ static void ahash_done_ctx_src(void *cbk ecode = -EIO; }
- ahash_unmap_ctx(ctx->dev, edesc, req, digestsize, DMA_TO_DEVICE); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_BIDIRECTIONAL); + memcpy(req->result, state->caam_ctx, digestsize); qi_cache_free(edesc);
print_hex_dump_debug("ctx@" __stringify(__LINE__)": ", DUMP_PREFIX_ADDRESS, 16, 4, state->caam_ctx, ctx->ctx_len, 1); - if (req->result) - print_hex_dump_debug("result@" __stringify(__LINE__)": ", - DUMP_PREFIX_ADDRESS, 16, 4, req->result, - digestsize, 1);
req->base.complete(&req->base, ecode); } @@ -3314,7 +3305,7 @@ static void ahash_done_ctx_dst(void *cbk ecode = -EIO; }
- ahash_unmap_ctx(ctx->dev, edesc, req, ctx->ctx_len, DMA_FROM_DEVICE); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_FROM_DEVICE); switch_buf(state); qi_cache_free(edesc);
@@ -3452,7 +3443,7 @@ static int ahash_update_ctx(struct ahash
return ret; unmap_ctx: - ahash_unmap_ctx(ctx->dev, edesc, req, ctx->ctx_len, DMA_BIDIRECTIONAL); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_BIDIRECTIONAL); qi_cache_free(edesc); return ret; } @@ -3484,7 +3475,7 @@ static int ahash_final_ctx(struct ahash_ sg_table = &edesc->sgt[0];
ret = ctx_map_to_qm_sg(ctx->dev, state, ctx->ctx_len, sg_table, - DMA_TO_DEVICE); + DMA_BIDIRECTIONAL); if (ret) goto unmap_ctx;
@@ -3503,22 +3494,13 @@ static int ahash_final_ctx(struct ahash_ } edesc->qm_sg_bytes = qm_sg_bytes;
- edesc->dst_dma = dma_map_single(ctx->dev, req->result, digestsize, - DMA_FROM_DEVICE); - if (dma_mapping_error(ctx->dev, edesc->dst_dma)) { - dev_err(ctx->dev, "unable to map dst\n"); - edesc->dst_dma = 0; - ret = -ENOMEM; - goto unmap_ctx; - } - memset(&req_ctx->fd_flt, 0, sizeof(req_ctx->fd_flt)); dpaa2_fl_set_final(in_fle, true); dpaa2_fl_set_format(in_fle, dpaa2_fl_sg); dpaa2_fl_set_addr(in_fle, edesc->qm_sg_dma); dpaa2_fl_set_len(in_fle, ctx->ctx_len + buflen); dpaa2_fl_set_format(out_fle, dpaa2_fl_single); - dpaa2_fl_set_addr(out_fle, edesc->dst_dma); + dpaa2_fl_set_addr(out_fle, state->ctx_dma); dpaa2_fl_set_len(out_fle, digestsize);
req_ctx->flc = &ctx->flc[FINALIZE]; @@ -3533,7 +3515,7 @@ static int ahash_final_ctx(struct ahash_ return ret;
unmap_ctx: - ahash_unmap_ctx(ctx->dev, edesc, req, digestsize, DMA_FROM_DEVICE); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_BIDIRECTIONAL); qi_cache_free(edesc); return ret; } @@ -3586,7 +3568,7 @@ static int ahash_finup_ctx(struct ahash_ sg_table = &edesc->sgt[0];
ret = ctx_map_to_qm_sg(ctx->dev, state, ctx->ctx_len, sg_table, - DMA_TO_DEVICE); + DMA_BIDIRECTIONAL); if (ret) goto unmap_ctx;
@@ -3605,22 +3587,13 @@ static int ahash_finup_ctx(struct ahash_ } edesc->qm_sg_bytes = qm_sg_bytes;
- edesc->dst_dma = dma_map_single(ctx->dev, req->result, digestsize, - DMA_FROM_DEVICE); - if (dma_mapping_error(ctx->dev, edesc->dst_dma)) { - dev_err(ctx->dev, "unable to map dst\n"); - edesc->dst_dma = 0; - ret = -ENOMEM; - goto unmap_ctx; - } - memset(&req_ctx->fd_flt, 0, sizeof(req_ctx->fd_flt)); dpaa2_fl_set_final(in_fle, true); dpaa2_fl_set_format(in_fle, dpaa2_fl_sg); dpaa2_fl_set_addr(in_fle, edesc->qm_sg_dma); dpaa2_fl_set_len(in_fle, ctx->ctx_len + buflen + req->nbytes); dpaa2_fl_set_format(out_fle, dpaa2_fl_single); - dpaa2_fl_set_addr(out_fle, edesc->dst_dma); + dpaa2_fl_set_addr(out_fle, state->ctx_dma); dpaa2_fl_set_len(out_fle, digestsize);
req_ctx->flc = &ctx->flc[FINALIZE]; @@ -3635,7 +3608,7 @@ static int ahash_finup_ctx(struct ahash_ return ret;
unmap_ctx: - ahash_unmap_ctx(ctx->dev, edesc, req, digestsize, DMA_FROM_DEVICE); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_BIDIRECTIONAL); qi_cache_free(edesc); return ret; } @@ -3704,18 +3677,19 @@ static int ahash_digest(struct ahash_req dpaa2_fl_set_addr(in_fle, sg_dma_address(req->src)); }
- edesc->dst_dma = dma_map_single(ctx->dev, req->result, digestsize, + state->ctx_dma_len = digestsize; + state->ctx_dma = dma_map_single(ctx->dev, state->caam_ctx, digestsize, DMA_FROM_DEVICE); - if (dma_mapping_error(ctx->dev, edesc->dst_dma)) { - dev_err(ctx->dev, "unable to map dst\n"); - edesc->dst_dma = 0; + if (dma_mapping_error(ctx->dev, state->ctx_dma)) { + dev_err(ctx->dev, "unable to map ctx\n"); + state->ctx_dma = 0; goto unmap; }
dpaa2_fl_set_final(in_fle, true); dpaa2_fl_set_len(in_fle, req->nbytes); dpaa2_fl_set_format(out_fle, dpaa2_fl_single); - dpaa2_fl_set_addr(out_fle, edesc->dst_dma); + dpaa2_fl_set_addr(out_fle, state->ctx_dma); dpaa2_fl_set_len(out_fle, digestsize);
req_ctx->flc = &ctx->flc[DIGEST]; @@ -3729,7 +3703,7 @@ static int ahash_digest(struct ahash_req return ret;
unmap: - ahash_unmap(ctx->dev, edesc, req, digestsize); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_FROM_DEVICE); qi_cache_free(edesc); return ret; } @@ -3764,11 +3738,12 @@ static int ahash_final_no_ctx(struct aha } }
- edesc->dst_dma = dma_map_single(ctx->dev, req->result, digestsize, + state->ctx_dma_len = digestsize; + state->ctx_dma = dma_map_single(ctx->dev, state->caam_ctx, digestsize, DMA_FROM_DEVICE); - if (dma_mapping_error(ctx->dev, edesc->dst_dma)) { - dev_err(ctx->dev, "unable to map dst\n"); - edesc->dst_dma = 0; + if (dma_mapping_error(ctx->dev, state->ctx_dma)) { + dev_err(ctx->dev, "unable to map ctx\n"); + state->ctx_dma = 0; goto unmap; }
@@ -3786,7 +3761,7 @@ static int ahash_final_no_ctx(struct aha dpaa2_fl_set_len(in_fle, buflen); } dpaa2_fl_set_format(out_fle, dpaa2_fl_single); - dpaa2_fl_set_addr(out_fle, edesc->dst_dma); + dpaa2_fl_set_addr(out_fle, state->ctx_dma); dpaa2_fl_set_len(out_fle, digestsize);
req_ctx->flc = &ctx->flc[DIGEST]; @@ -3801,7 +3776,7 @@ static int ahash_final_no_ctx(struct aha return ret;
unmap: - ahash_unmap(ctx->dev, edesc, req, digestsize); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_FROM_DEVICE); qi_cache_free(edesc); return ret; } @@ -3881,6 +3856,7 @@ static int ahash_update_no_ctx(struct ah } edesc->qm_sg_bytes = qm_sg_bytes;
+ state->ctx_dma_len = ctx->ctx_len; state->ctx_dma = dma_map_single(ctx->dev, state->caam_ctx, ctx->ctx_len, DMA_FROM_DEVICE); if (dma_mapping_error(ctx->dev, state->ctx_dma)) { @@ -3929,7 +3905,7 @@ static int ahash_update_no_ctx(struct ah
return ret; unmap_ctx: - ahash_unmap_ctx(ctx->dev, edesc, req, ctx->ctx_len, DMA_TO_DEVICE); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_TO_DEVICE); qi_cache_free(edesc); return ret; } @@ -3994,11 +3970,12 @@ static int ahash_finup_no_ctx(struct aha } edesc->qm_sg_bytes = qm_sg_bytes;
- edesc->dst_dma = dma_map_single(ctx->dev, req->result, digestsize, + state->ctx_dma_len = digestsize; + state->ctx_dma = dma_map_single(ctx->dev, state->caam_ctx, digestsize, DMA_FROM_DEVICE); - if (dma_mapping_error(ctx->dev, edesc->dst_dma)) { - dev_err(ctx->dev, "unable to map dst\n"); - edesc->dst_dma = 0; + if (dma_mapping_error(ctx->dev, state->ctx_dma)) { + dev_err(ctx->dev, "unable to map ctx\n"); + state->ctx_dma = 0; ret = -ENOMEM; goto unmap; } @@ -4009,7 +3986,7 @@ static int ahash_finup_no_ctx(struct aha dpaa2_fl_set_addr(in_fle, edesc->qm_sg_dma); dpaa2_fl_set_len(in_fle, buflen + req->nbytes); dpaa2_fl_set_format(out_fle, dpaa2_fl_single); - dpaa2_fl_set_addr(out_fle, edesc->dst_dma); + dpaa2_fl_set_addr(out_fle, state->ctx_dma); dpaa2_fl_set_len(out_fle, digestsize);
req_ctx->flc = &ctx->flc[DIGEST]; @@ -4024,7 +4001,7 @@ static int ahash_finup_no_ctx(struct aha
return ret; unmap: - ahash_unmap(ctx->dev, edesc, req, digestsize); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_FROM_DEVICE); qi_cache_free(edesc); return -ENOMEM; } @@ -4111,6 +4088,7 @@ static int ahash_update_first(struct aha scatterwalk_map_and_copy(next_buf, req->src, to_hash, *next_buflen, 0);
+ state->ctx_dma_len = ctx->ctx_len; state->ctx_dma = dma_map_single(ctx->dev, state->caam_ctx, ctx->ctx_len, DMA_FROM_DEVICE); if (dma_mapping_error(ctx->dev, state->ctx_dma)) { @@ -4154,7 +4132,7 @@ static int ahash_update_first(struct aha
return ret; unmap_ctx: - ahash_unmap_ctx(ctx->dev, edesc, req, ctx->ctx_len, DMA_TO_DEVICE); + ahash_unmap_ctx(ctx->dev, edesc, req, DMA_TO_DEVICE); qi_cache_free(edesc); return ret; } @@ -4173,6 +4151,7 @@ static int ahash_init(struct ahash_reque state->final = ahash_final_no_ctx;
state->ctx_dma = 0; + state->ctx_dma_len = 0; state->current_buf = 0; state->buf_dma = 0; state->buflen_0 = 0; --- a/drivers/crypto/caam/caamalg_qi2.h +++ b/drivers/crypto/caam/caamalg_qi2.h @@ -162,14 +162,12 @@ struct skcipher_edesc {
/* * ahash_edesc - s/w-extended ahash descriptor - * @dst_dma: I/O virtual address of req->result * @qm_sg_dma: I/O virtual address of h/w link table * @src_nents: number of segments in input scatterlist * @qm_sg_bytes: length of dma mapped qm_sg space * @sgt: pointer to h/w link table */ struct ahash_edesc { - dma_addr_t dst_dma; dma_addr_t qm_sg_dma; int src_nents; int qm_sg_bytes;
From: Horia Geantă horia.geanta@nxp.com
commit 418cd20e4dcdca97e6f6d59e6336228dacf2e45d upstream.
Commit 307244452d3d ("crypto: caam - generate hash keys in-place") fixed ahash implementation in caam/jr driver such that user-provided key buffer is not DMA mapped, since it's not guaranteed to be DMAable.
Apply a similar fix for caam/qi2 driver.
Cc: stable@vger.kernel.org # v4.20+ Fixes: 3f16f6c9d632 ("crypto: caam/qi2 - add support for ahash algorithms") Signed-off-by: Horia Geantă horia.geanta@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/caam/caamalg_qi2.c | 41 +++++++++++++------------------------- 1 file changed, 15 insertions(+), 26 deletions(-)
--- a/drivers/crypto/caam/caamalg_qi2.c +++ b/drivers/crypto/caam/caamalg_qi2.c @@ -3020,13 +3020,13 @@ static void split_key_sh_done(void *cbk_ }
/* Digest hash size if it is too large */ -static int hash_digest_key(struct caam_hash_ctx *ctx, const u8 *key_in, - u32 *keylen, u8 *key_out, u32 digestsize) +static int hash_digest_key(struct caam_hash_ctx *ctx, u32 *keylen, u8 *key, + u32 digestsize) { struct caam_request *req_ctx; u32 *desc; struct split_key_sh_result result; - dma_addr_t src_dma, dst_dma; + dma_addr_t key_dma; struct caam_flc *flc; dma_addr_t flc_dma; int ret = -ENOMEM; @@ -3043,17 +3043,10 @@ static int hash_digest_key(struct caam_h if (!flc) goto err_flc;
- src_dma = dma_map_single(ctx->dev, (void *)key_in, *keylen, - DMA_TO_DEVICE); - if (dma_mapping_error(ctx->dev, src_dma)) { - dev_err(ctx->dev, "unable to map key input memory\n"); - goto err_src_dma; - } - dst_dma = dma_map_single(ctx->dev, (void *)key_out, digestsize, - DMA_FROM_DEVICE); - if (dma_mapping_error(ctx->dev, dst_dma)) { - dev_err(ctx->dev, "unable to map key output memory\n"); - goto err_dst_dma; + key_dma = dma_map_single(ctx->dev, key, *keylen, DMA_BIDIRECTIONAL); + if (dma_mapping_error(ctx->dev, key_dma)) { + dev_err(ctx->dev, "unable to map key memory\n"); + goto err_key_dma; }
desc = flc->sh_desc; @@ -3078,14 +3071,14 @@ static int hash_digest_key(struct caam_h
dpaa2_fl_set_final(in_fle, true); dpaa2_fl_set_format(in_fle, dpaa2_fl_single); - dpaa2_fl_set_addr(in_fle, src_dma); + dpaa2_fl_set_addr(in_fle, key_dma); dpaa2_fl_set_len(in_fle, *keylen); dpaa2_fl_set_format(out_fle, dpaa2_fl_single); - dpaa2_fl_set_addr(out_fle, dst_dma); + dpaa2_fl_set_addr(out_fle, key_dma); dpaa2_fl_set_len(out_fle, digestsize);
print_hex_dump_debug("key_in@" __stringify(__LINE__)": ", - DUMP_PREFIX_ADDRESS, 16, 4, key_in, *keylen, 1); + DUMP_PREFIX_ADDRESS, 16, 4, key, *keylen, 1); print_hex_dump_debug("shdesc@" __stringify(__LINE__)": ", DUMP_PREFIX_ADDRESS, 16, 4, desc, desc_bytes(desc), 1); @@ -3105,17 +3098,15 @@ static int hash_digest_key(struct caam_h wait_for_completion(&result.completion); ret = result.err; print_hex_dump_debug("digested key@" __stringify(__LINE__)": ", - DUMP_PREFIX_ADDRESS, 16, 4, key_in, + DUMP_PREFIX_ADDRESS, 16, 4, key, digestsize, 1); }
dma_unmap_single(ctx->dev, flc_dma, sizeof(flc->flc) + desc_bytes(desc), DMA_TO_DEVICE); err_flc_dma: - dma_unmap_single(ctx->dev, dst_dma, digestsize, DMA_FROM_DEVICE); -err_dst_dma: - dma_unmap_single(ctx->dev, src_dma, *keylen, DMA_TO_DEVICE); -err_src_dma: + dma_unmap_single(ctx->dev, key_dma, *keylen, DMA_BIDIRECTIONAL); +err_key_dma: kfree(flc); err_flc: kfree(req_ctx); @@ -3137,12 +3128,10 @@ static int ahash_setkey(struct crypto_ah dev_dbg(ctx->dev, "keylen %d blocksize %d\n", keylen, blocksize);
if (keylen > blocksize) { - hashed_key = kmalloc_array(digestsize, sizeof(*hashed_key), - GFP_KERNEL | GFP_DMA); + hashed_key = kmemdup(key, keylen, GFP_KERNEL | GFP_DMA); if (!hashed_key) return -ENOMEM; - ret = hash_digest_key(ctx, key, &keylen, hashed_key, - digestsize); + ret = hash_digest_key(ctx, &keylen, hashed_key, digestsize); if (ret) goto bad_free_key; key = hashed_key;
From: Eric Biggers ebiggers@google.com
commit 767f015ea0b7ab9d60432ff6cd06b664fd71f50f upstream.
If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free.
arm32 xts-aes-neonbs doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and it was actually broken prior to the alignmask being removed by commit cc477bf64573 ("crypto: arm/aes - replace bit-sliced OpenSSL NEON code"). Thus, update xts-aes-neonbs to start checking the return value of skcipher_walk_virt().
Fixes: e4e7f10bfc40 ("ARM: add support for bit sliced AES using NEON instructions") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/crypto/aes-neonbs-glue.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm/crypto/aes-neonbs-glue.c +++ b/arch/arm/crypto/aes-neonbs-glue.c @@ -278,6 +278,8 @@ static int __xts_crypt(struct skcipher_r int err;
err = skcipher_walk_virt(&walk, req, true); + if (err) + return err;
crypto_cipher_encrypt_one(ctx->tweak_tfm, walk.iv, walk.iv);
From: Eric Biggers ebiggers@google.com
commit 4a8108b70508df0b6c4ffa4a3974dab93dcbe851 upstream.
If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free.
xts-aes-neonbs doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and unconditionally accessing walk.iv has caused a real problem in other algorithms. Thus, update xts-aes-neonbs to start checking the return value of skcipher_walk_virt().
Fixes: 1abee99eafab ("crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/crypto/aes-neonbs-glue.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -304,6 +304,8 @@ static int __xts_crypt(struct skcipher_r int err;
err = skcipher_walk_virt(&walk, req, false); + if (err) + return err;
kernel_neon_begin(); neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, ctx->key.rounds, 1);
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit 67476656febd7ec5f1fe1aeec3c441fcf53b1e45 upstream.
This move the dependency to DEV_DAX_PMEM_COMPAT such that only if DEV_DAX_PMEM is built as module we can allow the compat support.
This allows to test the new code easily in a emulation setup where we often build things without module support.
Cc: stable@vger.kernel.org Fixes: 730926c3b099 ("device-dax: Add /sys/class/dax backwards compatibility") Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dax/Kconfig | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -23,7 +23,6 @@ config DEV_DAX config DEV_DAX_PMEM tristate "PMEM DAX: direct access to persistent memory" depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX - depends on m # until we can kill DEV_DAX_PMEM_COMPAT default DEV_DAX help Support raw access to persistent memory. Note that this @@ -50,7 +49,7 @@ config DEV_DAX_KMEM
config DEV_DAX_PMEM_COMPAT tristate "PMEM DAX: support the deprecated /sys/class/dax interface" - depends on DEV_DAX_PMEM + depends on m && DEV_DAX_PMEM=m default DEV_DAX_PMEM help Older versions of the libdaxctl library expect to find all
From: Christoph Muellner christoph.muellner@theobroma-systems.com
commit 28f22fb755ecf9f933f045bc0afdb8140641b01c upstream.
Add disable-cqe-dcmd as optional property for MMC hosts. This property allows to disable or not enable the direct command features of the command queue engine.
Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com Signed-off-by: Philipp Tomsich philipp.tomsich@theobroma-systems.com Fixes: 84362d79f436 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/devicetree/bindings/mmc/mmc.txt | 2 ++ 1 file changed, 2 insertions(+)
--- a/Documentation/devicetree/bindings/mmc/mmc.txt +++ b/Documentation/devicetree/bindings/mmc/mmc.txt @@ -64,6 +64,8 @@ Optional properties: whether pwrseq-simple is used. Default to 10ms if no available. - supports-cqe : The presence of this property indicates that the corresponding MMC host controller supports HW command queue feature. +- disable-cqe-dcmd: This property indicates that the MMC controller's command + queue engine (CQE) does not support direct commands (DCMDs).
*NOTE* on CD and WP polarity. To use common for all SD/MMC host controllers line polarity properties, we have to fix the meaning of the "normal" and "inverted"
From: Sowjanya Komatineni skomatineni@nvidia.com
commit 92cd1667d579af5c3ef383680598a112da3695df upstream.
ddr_signaling is set to true for DDR50 and DDR52 modes but is not set back to false for other modes. This programs incorrect host clock when mode change happens from DDR52/DDR50 to other SDR or HS modes like incase of mmc_retune where it switches from HS400 to HS DDR and then from HS DDR to HS mode and then to HS200.
This patch fixes the ddr_signaling to set properly for non DDR modes.
Tested-by: Jon Hunter jonathanh@nvidia.com Acked-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Sowjanya Komatineni skomatineni@nvidia.com Cc: stable@vger.kernel.org # v4.20 + Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-tegra.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/mmc/host/sdhci-tegra.c +++ b/drivers/mmc/host/sdhci-tegra.c @@ -779,6 +779,7 @@ static void tegra_sdhci_set_uhs_signalin bool set_dqs_trim = false; bool do_hs400_dll_cal = false;
+ tegra_host->ddr_signaling = false; switch (timing) { case MMC_TIMING_UHS_SDR50: case MMC_TIMING_UHS_SDR104:
From: Raul E Rangel rrangel@chromium.org
commit 43d8dabb4074cf7f3b1404bfbaeba5aa6f3e5cfc upstream.
The tag set is allocated in mmc_init_queue but never freed. This results in a memory leak. This change makes sure we free the tag set when the queue is also freed.
Signed-off-by: Raul E Rangel rrangel@chromium.org Reviewed-by: Jens Axboe axboe@kernel.dk Acked-by: Adrian Hunter adrian.hunter@intel.com Fixes: 81196976ed94 ("mmc: block: Add blk-mq support") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/core/queue.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -472,6 +472,7 @@ void mmc_cleanup_queue(struct mmc_queue blk_mq_unquiesce_queue(q);
blk_cleanup_queue(q); + blk_mq_free_tag_set(&mq->tag_set);
/* * A request can be completed before the next request, potentially
From: Adrian Hunter adrian.hunter@intel.com
commit 0a49a619e7e1aeb3f5f5511ca59314423c83dae2 upstream.
Some time ago, a fix was done for the sdhci-acpi driver, refer commit 6e1c7d6103fe ("mmc: sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO hangs"). The same issue was not expected to affect the sdhci-pci driver, but there have been reports to the contrary, so make the same hardware setting change.
This patch applies to v5.0+ but before that backports will be required.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/Kconfig | 1 drivers/mmc/host/sdhci-pci-core.c | 96 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 97 insertions(+)
--- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -92,6 +92,7 @@ config MMC_SDHCI_PCI tristate "SDHCI support on PCI bus" depends on MMC_SDHCI && PCI select MMC_CQHCI + select IOSF_MBI if X86 help This selects the PCI Secure Digital Host Controller Interface. Most controllers found today are PCI devices. --- a/drivers/mmc/host/sdhci-pci-core.c +++ b/drivers/mmc/host/sdhci-pci-core.c @@ -31,6 +31,10 @@ #include <linux/mmc/sdhci-pci-data.h> #include <linux/acpi.h>
+#ifdef CONFIG_X86 +#include <asm/iosf_mbi.h> +#endif + #include "cqhci.h"
#include "sdhci.h" @@ -451,6 +455,50 @@ static const struct sdhci_pci_fixes sdhc .probe_slot = pch_hc_probe_slot, };
+#ifdef CONFIG_X86 + +#define BYT_IOSF_SCCEP 0x63 +#define BYT_IOSF_OCP_NETCTRL0 0x1078 +#define BYT_IOSF_OCP_TIMEOUT_BASE GENMASK(10, 8) + +static void byt_ocp_setting(struct pci_dev *pdev) +{ + u32 val = 0; + + if (pdev->device != PCI_DEVICE_ID_INTEL_BYT_EMMC && + pdev->device != PCI_DEVICE_ID_INTEL_BYT_SDIO && + pdev->device != PCI_DEVICE_ID_INTEL_BYT_SD && + pdev->device != PCI_DEVICE_ID_INTEL_BYT_EMMC2) + return; + + if (iosf_mbi_read(BYT_IOSF_SCCEP, MBI_CR_READ, BYT_IOSF_OCP_NETCTRL0, + &val)) { + dev_err(&pdev->dev, "%s read error\n", __func__); + return; + } + + if (!(val & BYT_IOSF_OCP_TIMEOUT_BASE)) + return; + + val &= ~BYT_IOSF_OCP_TIMEOUT_BASE; + + if (iosf_mbi_write(BYT_IOSF_SCCEP, MBI_CR_WRITE, BYT_IOSF_OCP_NETCTRL0, + val)) { + dev_err(&pdev->dev, "%s write error\n", __func__); + return; + } + + dev_dbg(&pdev->dev, "%s completed\n", __func__); +} + +#else + +static inline void byt_ocp_setting(struct pci_dev *pdev) +{ +} + +#endif + enum { INTEL_DSM_FNS = 0, INTEL_DSM_V18_SWITCH = 3, @@ -715,6 +763,8 @@ static void byt_probe_slot(struct sdhci_
byt_read_dsm(slot);
+ byt_ocp_setting(slot->chip->pdev); + ops->execute_tuning = intel_execute_tuning; ops->start_signal_voltage_switch = intel_start_signal_voltage_switch;
@@ -938,7 +988,35 @@ static int byt_sd_probe_slot(struct sdhc return 0; }
+#ifdef CONFIG_PM_SLEEP + +static int byt_resume(struct sdhci_pci_chip *chip) +{ + byt_ocp_setting(chip->pdev); + + return sdhci_pci_resume_host(chip); +} + +#endif + +#ifdef CONFIG_PM + +static int byt_runtime_resume(struct sdhci_pci_chip *chip) +{ + byt_ocp_setting(chip->pdev); + + return sdhci_pci_runtime_resume_host(chip); +} + +#endif + static const struct sdhci_pci_fixes sdhci_intel_byt_emmc = { +#ifdef CONFIG_PM_SLEEP + .resume = byt_resume, +#endif +#ifdef CONFIG_PM + .runtime_resume = byt_runtime_resume, +#endif .allow_runtime_pm = true, .probe_slot = byt_emmc_probe_slot, .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC | @@ -972,6 +1050,12 @@ static const struct sdhci_pci_fixes sdhc };
static const struct sdhci_pci_fixes sdhci_ni_byt_sdio = { +#ifdef CONFIG_PM_SLEEP + .resume = byt_resume, +#endif +#ifdef CONFIG_PM + .runtime_resume = byt_runtime_resume, +#endif .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC | SDHCI_QUIRK_NO_LED, .quirks2 = SDHCI_QUIRK2_HOST_OFF_CARD_ON | @@ -983,6 +1067,12 @@ static const struct sdhci_pci_fixes sdhc };
static const struct sdhci_pci_fixes sdhci_intel_byt_sdio = { +#ifdef CONFIG_PM_SLEEP + .resume = byt_resume, +#endif +#ifdef CONFIG_PM + .runtime_resume = byt_runtime_resume, +#endif .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC | SDHCI_QUIRK_NO_LED, .quirks2 = SDHCI_QUIRK2_HOST_OFF_CARD_ON | @@ -994,6 +1084,12 @@ static const struct sdhci_pci_fixes sdhc };
static const struct sdhci_pci_fixes sdhci_intel_byt_sd = { +#ifdef CONFIG_PM_SLEEP + .resume = byt_resume, +#endif +#ifdef CONFIG_PM + .runtime_resume = byt_runtime_resume, +#endif .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC | SDHCI_QUIRK_NO_LED, .quirks2 = SDHCI_QUIRK2_CARD_ON_NEEDS_BUS_ON |
From: Takashi Iwai tiwai@suse.de
commit 7f84ff68be05ec7a5d2acf8fdc734fe5897af48f upstream.
The line6 toneport driver has code for some delayed initialization, and this hits the kernel Oops because mutex and other sleepable functions are used in the timer callback. Fix the abuse by a delayed work instead so that everything works gracefully.
Reported-by: syzbot+a07d0142e74fdd595cfb@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/line6/toneport.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
--- a/sound/usb/line6/toneport.c +++ b/sound/usb/line6/toneport.c @@ -54,8 +54,8 @@ struct usb_line6_toneport { /* Firmware version (x 100) */ u8 firmware_version;
- /* Timer for delayed PCM startup */ - struct timer_list timer; + /* Work for delayed PCM startup */ + struct delayed_work pcm_work;
/* Device type */ enum line6_device_type type; @@ -241,9 +241,10 @@ static int snd_toneport_source_put(struc return 1; }
-static void toneport_start_pcm(struct timer_list *t) +static void toneport_start_pcm(struct work_struct *work) { - struct usb_line6_toneport *toneport = from_timer(toneport, t, timer); + struct usb_line6_toneport *toneport = + container_of(work, struct usb_line6_toneport, pcm_work.work); struct usb_line6 *line6 = &toneport->line6;
line6_pcm_acquire(line6->line6pcm, LINE6_STREAM_MONITOR, true); @@ -393,7 +394,8 @@ static int toneport_setup(struct usb_lin if (toneport_has_led(toneport)) toneport_update_led(toneport);
- mod_timer(&toneport->timer, jiffies + TONEPORT_PCM_DELAY * HZ); + schedule_delayed_work(&toneport->pcm_work, + msecs_to_jiffies(TONEPORT_PCM_DELAY * 1000)); return 0; }
@@ -405,7 +407,7 @@ static void line6_toneport_disconnect(st struct usb_line6_toneport *toneport = (struct usb_line6_toneport *)line6;
- del_timer_sync(&toneport->timer); + cancel_delayed_work_sync(&toneport->pcm_work);
if (toneport_has_led(toneport)) toneport_remove_leds(toneport); @@ -422,7 +424,7 @@ static int toneport_init(struct usb_line struct usb_line6_toneport *toneport = (struct usb_line6_toneport *) line6;
toneport->type = id->driver_info; - timer_setup(&toneport->timer, toneport_start_pcm, 0); + INIT_DELAYED_WORK(&toneport->pcm_work, toneport_start_pcm);
line6->disconnect = line6_toneport_disconnect;
From: Wenwen Wang wang6495@umn.edu
commit cb5173594d50c72b7bfa14113dfc5084b4d2f726 upstream.
In parse_audio_selector_unit(), the string array 'namelist' is allocated through kmalloc_array(), and each string pointer in this array, i.e., 'namelist[]', is allocated through kmalloc() in the following for loop. Then, a control instance 'kctl' is created by invoking snd_ctl_new1(). If an error occurs during the creation process, the string array 'namelist', including all string pointers in the array 'namelist[]', should be freed, before the error code ENOMEM is returned. However, the current code does not free 'namelist[]', resulting in memory leaks.
To fix the above issue, free all string pointers 'namelist[]' in a loop.
Signed-off-by: Wenwen Wang wang6495@umn.edu Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/mixer.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2675,6 +2675,8 @@ static int parse_audio_selector_unit(str kctl = snd_ctl_new1(&mixer_selectunit_ctl, cval); if (! kctl) { usb_audio_err(state->chip, "cannot malloc kcontrol\n"); + for (i = 0; i < desc->bNrInPins; i++) + kfree(namelist[i]); kfree(namelist); kfree(cval); return -ENOMEM;
From: Hui Wang hui.wang@canonical.com
commit 8c2e6728c2bf95765b724e07d0278ae97cd1ee0d upstream.
The driver will check the monitor presence when resuming from suspend, starting poll or interrupt triggers. In these 3 situations, the jack_dirty will be set to 1 first, then the hda_jack.c reads the pin_sense from register, after reading the register, the jack_dirty will be set to 0. But hdmi_repoll_work() is enabled in these 3 situations, It will read the pin_sense a couple of times subsequently, since the jack_dirty is 0 now, It does not read the register anymore, instead it uses the shadow pin_sense which is read at the first time.
It is meaningless to check the shadow pin_sense a couple of times, we need to read the register to check the real plugging state, so we set the jack_dirty to 1 in the hdmi_repoll_work().
Signed-off-by: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_hdmi.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -1663,6 +1663,11 @@ static void hdmi_repoll_eld(struct work_ container_of(to_delayed_work(work), struct hdmi_spec_per_pin, work); struct hda_codec *codec = per_pin->codec; struct hdmi_spec *spec = codec->spec; + struct hda_jack_tbl *jack; + + jack = snd_hda_jack_tbl_get(codec, per_pin->pin_nid); + if (jack) + jack->jack_dirty = 1;
if (per_pin->repoll_count++ > 6) per_pin->repoll_count = 0;
From: Hui Wang hui.wang@canonical.com
commit 7f641e26a6df9269cb25dd7a4b0a91d6586ed441 upstream.
On the machines with AMD GPU or Nvidia GPU, we often meet this issue: after s3, there are 4 HDMI/DP audio devices in the gnome-sound-setting even there is no any monitors plugged.
When this problem happens, we check the /proc/asound/cardX/eld#N.M, we will find the monitor_present=1, eld_valid=0.
The root cause is BIOS or GPU driver makes the PRESENCE valid even no monitor plugged, and of course the driver will not get the valid eld_data subsequently.
In this situation, we should not report the jack_plugged event, to do so, let us change the function hdmi_present_sense_via_verbs(). In this function, it reads the pin_sense via snd_hda_pin_sense(), after calling this function, the jack_dirty is 0, and before exiting via_verbs(), we change the shadow pin_sense according to both monitor_present and eld_valid, then in the snd_hda_jack_report_sync(), since the jack_dirty is still 0, it will report jack event according to this modified shadow pin_sense.
After this change, the driver will not report Jack_is_plugged event through hdmi_present_sense_via_verbs() if monitor_present is 1 and eld_valid is 0.
Signed-off-by: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_hdmi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -1551,9 +1551,11 @@ static bool hdmi_present_sense_via_verbs ret = !repoll || !eld->monitor_present || eld->eld_valid;
jack = snd_hda_jack_tbl_get(codec, pin_nid); - if (jack) + if (jack) { jack->block_report = !ret; - + jack->pin_sense = (eld->monitor_present && eld->eld_valid) ? + AC_PINSENSE_PRESENCE : 0; + } mutex_unlock(&per_pin->lock); return ret; }
From: Kailang Yang kailang@realtek.com
commit 607ca3bd220f4022e6f5356026b19dafc363863a upstream.
Let EAPD turn on after set pin output.
[ NOTE: This change is supposed to reduce the possible click noises at (runtime) PM resume. The functionality should be same (i.e. the verbs are executed correctly) no matter which order is, so this should be safe to apply for all codecs -- tiwai ]
Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -803,11 +803,10 @@ static int alc_init(struct hda_codec *co if (spec->init_hook) spec->init_hook(codec);
+ snd_hda_gen_init(codec); alc_fix_pll(codec); alc_auto_init_amp(codec, spec->init_amp);
- snd_hda_gen_init(codec); - snd_hda_apply_fixup(codec, HDA_FIXUP_ACT_INIT);
return 0;
From: Jeremy Soller jeremy@system76.com
commit 80a5052db75131423b67f38b21958555d7d970e4 upstream.
On the System76 Gazelle (gaze14), there is a headset microphone input attached to 0x1a that does not have a jack detect. In order to get it working, the pin configuration needs to be set correctly, and the ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC fixup needs to be applied. This is identical to the patch already applied for the System76 Darter Pro (darp5).
Signed-off-by: Jeremy Soller jeremy@system76.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6932,6 +6932,8 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1462, 0xb120, "MSI Cubi MS-B120", ALC283_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1462, 0xb171, "Cubi N 8GL (MS-B171)", ALC283_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1558, 0x1325, "System76 Darter Pro (darp5)", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x8550, "System76 Gazelle (gaze14)", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x8560, "System76 Gazelle (gaze14)", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x17aa, 0x1036, "Lenovo P520", ALC233_FIXUP_LENOVO_MULTI_CODECS), SND_PCI_QUIRK(0x17aa, 0x20f2, "Thinkpad SL410/510", ALC269_FIXUP_SKU_IGNORE), SND_PCI_QUIRK(0x17aa, 0x215e, "Thinkpad L512", ALC269_FIXUP_SKU_IGNORE),
From: Jon Hunter jonathanh@nvidia.com
commit ecb2795c08bc825ebd604997e5be440b060c5b18 upstream.
The max98090 driver defines 3 DAPM muxes; one for the right line output (LINMOD Mux), one for the left headphone mixer source (MIXHPLSEL Mux) and one for the right headphone mixer source (MIXHPRSEL Mux). The same bit is used for the mux as well as the DAPM enable, and although the mux can be correctly configured, after playback has completed, the mux will be reset during the disable phase. This is preventing the state of these muxes from being saved and restored correctly on system reboot. Fix this by marking these muxes as SND_SOC_NOPM.
Note this has been verified this on the Tegra124 Nyan Big which features the MAX98090 codec.
Signed-off-by: Jon Hunter jonathanh@nvidia.com Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/codecs/max98090.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/sound/soc/codecs/max98090.c +++ b/sound/soc/codecs/max98090.c @@ -1194,14 +1194,14 @@ static const struct snd_soc_dapm_widget &max98090_right_rcv_mixer_controls[0], ARRAY_SIZE(max98090_right_rcv_mixer_controls)),
- SND_SOC_DAPM_MUX("LINMOD Mux", M98090_REG_LOUTR_MIXER, - M98090_LINMOD_SHIFT, 0, &max98090_linmod_mux), + SND_SOC_DAPM_MUX("LINMOD Mux", SND_SOC_NOPM, 0, 0, + &max98090_linmod_mux),
- SND_SOC_DAPM_MUX("MIXHPLSEL Mux", M98090_REG_HP_CONTROL, - M98090_MIXHPLSEL_SHIFT, 0, &max98090_mixhplsel_mux), + SND_SOC_DAPM_MUX("MIXHPLSEL Mux", SND_SOC_NOPM, 0, 0, + &max98090_mixhplsel_mux),
- SND_SOC_DAPM_MUX("MIXHPRSEL Mux", M98090_REG_HP_CONTROL, - M98090_MIXHPRSEL_SHIFT, 0, &max98090_mixhprsel_mux), + SND_SOC_DAPM_MUX("MIXHPRSEL Mux", SND_SOC_NOPM, 0, 0, + &max98090_mixhprsel_mux),
SND_SOC_DAPM_PGA("HP Left Out", M98090_REG_OUTPUT_ENABLE, M98090_HPLEN_SHIFT, 0, NULL, 0),
From: Curtis Malainey cujomalainey@chromium.org
commit a46eb523220e242affb9a6bc9bb8efc05f4f7459 upstream.
The current algorithm allows 3 types of transfers, 16bit, 32bit and burst. According to Realtek, 16bit transfers have a special restriction in that it is restricted to the memory region of 0x18020000 ~ 0x18021000. This region is the memory location of the I2C registers. The current algorithm does not uphold this restriction and therefore fails to complete writes.
Since this has been broken for some time it likely no one is using it. Better to simply disable the 16 bit writes. This will allow users to properly load firmware over SPI without data corruption.
Signed-off-by: Curtis Malainey cujomalainey@chromium.org Reviewed-by: Ben Zhang benzh@chromium.org Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/codecs/rt5677-spi.c | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-)
--- a/sound/soc/codecs/rt5677-spi.c +++ b/sound/soc/codecs/rt5677-spi.c @@ -57,13 +57,15 @@ static DEFINE_MUTEX(spi_mutex); * RT5677_SPI_READ/WRITE_32: Transfer 4 bytes * RT5677_SPI_READ/WRITE_BURST: Transfer any multiples of 8 bytes * - * For example, reading 260 bytes at 0x60030002 uses the following commands: - * 0x60030002 RT5677_SPI_READ_16 2 bytes + * Note: + * 16 Bit writes and reads are restricted to the address range + * 0x18020000 ~ 0x18021000 + * + * For example, reading 256 bytes at 0x60030004 uses the following commands: * 0x60030004 RT5677_SPI_READ_32 4 bytes * 0x60030008 RT5677_SPI_READ_BURST 240 bytes * 0x600300F8 RT5677_SPI_READ_BURST 8 bytes * 0x60030100 RT5677_SPI_READ_32 4 bytes - * 0x60030104 RT5677_SPI_READ_16 2 bytes * * Input: * @read: true for read commands; false for write commands @@ -78,15 +80,13 @@ static u8 rt5677_spi_select_cmd(bool rea { u8 cmd;
- if (align == 2 || align == 6 || remain == 2) { - cmd = RT5677_SPI_READ_16; - *len = 2; - } else if (align == 4 || remain <= 6) { + if (align == 4 || remain <= 4) { cmd = RT5677_SPI_READ_32; *len = 4; } else { cmd = RT5677_SPI_READ_BURST; - *len = min_t(u32, remain & ~7, RT5677_SPI_BURST_LEN); + *len = (((remain - 1) >> 3) + 1) << 3; + *len = min_t(u32, *len, RT5677_SPI_BURST_LEN); } return read ? cmd : cmd + 1; } @@ -107,7 +107,7 @@ static void rt5677_spi_reverse(u8 *dst, } }
-/* Read DSP address space using SPI. addr and len have to be 2-byte aligned. */ +/* Read DSP address space using SPI. addr and len have to be 4-byte aligned. */ int rt5677_spi_read(u32 addr, void *rxbuf, size_t len) { u32 offset; @@ -123,7 +123,7 @@ int rt5677_spi_read(u32 addr, void *rxbu if (!g_spi) return -ENODEV;
- if ((addr & 1) || (len & 1)) { + if ((addr & 3) || (len & 3)) { dev_err(&g_spi->dev, "Bad read align 0x%x(%zu)\n", addr, len); return -EACCES; } @@ -158,13 +158,13 @@ int rt5677_spi_read(u32 addr, void *rxbu } EXPORT_SYMBOL_GPL(rt5677_spi_read);
-/* Write DSP address space using SPI. addr has to be 2-byte aligned. - * If len is not 2-byte aligned, an extra byte of zero is written at the end +/* Write DSP address space using SPI. addr has to be 4-byte aligned. + * If len is not 4-byte aligned, then extra zeros are written at the end * as padding. */ int rt5677_spi_write(u32 addr, const void *txbuf, size_t len) { - u32 offset, len_with_pad = len; + u32 offset; int status = 0; struct spi_transfer t; struct spi_message m; @@ -177,22 +177,19 @@ int rt5677_spi_write(u32 addr, const voi if (!g_spi) return -ENODEV;
- if (addr & 1) { + if (addr & 3) { dev_err(&g_spi->dev, "Bad write align 0x%x(%zu)\n", addr, len); return -EACCES; }
- if (len & 1) - len_with_pad = len + 1; - memset(&t, 0, sizeof(t)); t.tx_buf = buf; t.speed_hz = RT5677_SPI_FREQ; spi_message_init_with_transfers(&m, &t, 1);
- for (offset = 0; offset < len_with_pad;) { + for (offset = 0; offset < len;) { spi_cmd = rt5677_spi_select_cmd(false, (addr + offset) & 7, - len_with_pad - offset, &t.len); + len - offset, &t.len);
/* Construct SPI message header */ buf[0] = spi_cmd;
From: S.j. Wang shengjiu.wang@nxp.com
commit 903c220b1ece12f17c868e43f2243b8f81ff2d4c upstream.
case ESAI_HCKT_EXTAL and case ESAI_HCKR_EXTAL should be independent of each other, so replace fall-through with break.
Fixes: 43d24e76b698 ("ASoC: fsl_esai: Add ESAI CPU DAI driver") Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Acked-by: Nicolin Chen nicoleotsuka@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/fsl/fsl_esai.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/fsl/fsl_esai.c +++ b/sound/soc/fsl/fsl_esai.c @@ -251,7 +251,7 @@ static int fsl_esai_set_dai_sysclk(struc break; case ESAI_HCKT_EXTAL: ecr |= ESAI_ECR_ETI; - /* fall through */ + break; case ESAI_HCKR_EXTAL: ecr |= ESAI_ECR_ERI; break;
From: Libin Yang libin.yang@intel.com
commit 01c8327667c249818d3712c3e25c7ad2aca7f389 upstream.
In resume from S3, HDAC HDMI codec driver dapm event callback may be operated before HDMI codec driver turns on the display audio power domain because of the contest between display driver and hdmi codec driver.
This patch adds the device_link between soc card device (consumer) and hdmi codec device (supplier) to make sure the sequence is always correct.
Signed-off-by: Libin Yang libin.yang@intel.com Reviewed-by: Takashi Iwai tiwai@suse.de Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/codecs/hdac_hdmi.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/sound/soc/codecs/hdac_hdmi.c +++ b/sound/soc/codecs/hdac_hdmi.c @@ -1855,6 +1855,17 @@ static int hdmi_codec_probe(struct snd_s hdmi->card = dapm->card->snd_card;
/* + * Setup a device_link between card device and HDMI codec device. + * The card device is the consumer and the HDMI codec device is + * the supplier. With this setting, we can make sure that the audio + * domain in display power will be always turned on before operating + * on the HDMI audio codec registers. + * Let's use the flag DL_FLAG_AUTOREMOVE_CONSUMER. This can make + * sure the device link is freed when the machine driver is removed. + */ + device_link_add(component->card->dev, &hdev->dev, DL_FLAG_RPM_ACTIVE | + DL_FLAG_AUTOREMOVE_CONSUMER); + /* * hdac_device core already sets the state to active and calls * get_noresume. So enable runtime and set the device to suspend. */
From: Daniel Borkmann daniel@iogearbox.net
commit 8968c67a82ab7501bc3b9439c3624a49b42fe54c upstream.
Prefetch-with-intent-to-write is currently part of the XADD mapping in the AArch64 JIT and follows the kernel's implementation of atomic_add. This may interfere with other threads executing the LDXR/STXR loop, leading to potential starvation and fairness issues. Drop the optional prefetch instruction.
Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD") Reported-by: Will Deacon will.deacon@arm.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Jean-Philippe Brucker jean-philippe.brucker@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/net/bpf_jit.h | 6 ------ arch/arm64/net/bpf_jit_comp.c | 1 - 2 files changed, 7 deletions(-)
--- a/arch/arm64/net/bpf_jit.h +++ b/arch/arm64/net/bpf_jit.h @@ -100,12 +100,6 @@ #define A64_STXR(sf, Rt, Rn, Rs) \ A64_LSX(sf, Rt, Rn, Rs, STORE_EX)
-/* Prefetch */ -#define A64_PRFM(Rn, type, target, policy) \ - aarch64_insn_gen_prefetch(Rn, AARCH64_INSN_PRFM_TYPE_##type, \ - AARCH64_INSN_PRFM_TARGET_##target, \ - AARCH64_INSN_PRFM_POLICY_##policy) - /* Add/subtract (immediate) */ #define A64_ADDSUB_IMM(sf, Rd, Rn, imm12, type) \ aarch64_insn_gen_add_sub_imm(Rd, Rn, imm12, \ --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -762,7 +762,6 @@ emit_cond_jmp: case BPF_STX | BPF_XADD | BPF_DW: emit_a64_mov_i(1, tmp, off, ctx); emit(A64_ADD(1, tmp, tmp, dst), ctx); - emit(A64_PRFM(tmp, PST, L1, STRM), ctx); emit(A64_LDXR(isdw, tmp2, tmp), ctx); emit(A64_ADD(isdw, tmp2, tmp2, src), ctx); emit(A64_STXR(isdw, tmp2, tmp, tmp3), ctx);
From: Daniel Borkmann daniel@iogearbox.net
commit af959b18fd447170a10865283ba691af4353cc7f upstream.
systemtap folks reported the following splat recently:
[ 7790.862212] WARNING: CPU: 3 PID: 26759 at arch/x86/kernel/kprobes/core.c:1022 kprobe_fault_handler+0xec/0xf0 [...] [ 7790.864113] CPU: 3 PID: 26759 Comm: sshd Not tainted 5.1.0-0.rc7.git1.1.fc31.x86_64 #1 [ 7790.864198] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS[...] [ 7790.864314] RIP: 0010:kprobe_fault_handler+0xec/0xf0 [ 7790.864375] Code: 48 8b 50 [...] [ 7790.864714] RSP: 0018:ffffc06800bdbb48 EFLAGS: 00010082 [ 7790.864812] RAX: ffff9e2b75a16320 RBX: 0000000000000000 RCX: 0000000000000000 [ 7790.865306] RDX: ffffffffffffffff RSI: 000000000000000e RDI: ffffc06800bdbbf8 [ 7790.865514] RBP: ffffc06800bdbbf8 R08: 0000000000000000 R09: 0000000000000000 [ 7790.865960] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc06800bdbbf8 [ 7790.866037] R13: ffff9e2ab56a0418 R14: ffff9e2b6d0bb400 R15: ffff9e2b6d268000 [ 7790.866114] FS: 00007fde49937d80(0000) GS:ffff9e2b75a00000(0000) knlGS:0000000000000000 [ 7790.866193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7790.866318] CR2: 0000000000000000 CR3: 000000012f312000 CR4: 00000000000006e0 [ 7790.866419] Call Trace: [ 7790.866677] do_user_addr_fault+0x64/0x480 [ 7790.867513] do_page_fault+0x33/0x210 [ 7790.868002] async_page_fault+0x1e/0x30 [ 7790.868071] RIP: 0010: (null) [ 7790.868144] Code: Bad RIP value. [ 7790.868229] RSP: 0018:ffffc06800bdbca8 EFLAGS: 00010282 [ 7790.868362] RAX: ffff9e2b598b60f8 RBX: ffffc06800bdbe48 RCX: 0000000000000004 [ 7790.868629] RDX: 0000000000000004 RSI: ffffc06800bdbc6c RDI: ffff9e2b598b60f0 [ 7790.868834] RBP: ffffc06800bdbcf8 R08: 0000000000000000 R09: 0000000000000004 [ 7790.870432] R10: 00000000ff6f7a03 R11: 0000000000000000 R12: 0000000000000001 [ 7790.871859] R13: ffffc06800bdbcb8 R14: 0000000000000000 R15: ffff9e2acd0a5310 [ 7790.873455] ? vfs_read+0x5/0x170 [ 7790.874639] ? vfs_read+0x1/0x170 [ 7790.875834] ? trace_call_bpf+0xf6/0x260 [ 7790.877044] ? vfs_read+0x1/0x170 [ 7790.878208] ? vfs_read+0x5/0x170 [ 7790.879345] ? kprobe_perf_func+0x233/0x260 [ 7790.880503] ? vfs_read+0x1/0x170 [ 7790.881632] ? vfs_read+0x5/0x170 [ 7790.882751] ? kprobe_ftrace_handler+0x92/0xf0 [ 7790.883926] ? __vfs_read+0x30/0x30 [ 7790.885050] ? ftrace_ops_assist_func+0x94/0x100 [ 7790.886183] ? vfs_read+0x1/0x170 [ 7790.887283] ? vfs_read+0x5/0x170 [ 7790.888348] ? ksys_read+0x5a/0xe0 [ 7790.889389] ? do_syscall_64+0x5c/0xa0 [ 7790.890401] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
After some debugging, turns out that the logic in 2cbd95a5c4fb ("bpf: change parameters of call/branch offset adjustment") has a bug that is exposed after 52875a04f4b2 ("bpf: verifier: remove dead code") in that we miss some of the jump offset adjustments after code patching when we remove dead code, more concretely, upon backward jump spanning over the area that is being removed.
BPF insns of a case that was hit pre 52875a04f4b2:
[...] 676: (85) call bpf_perf_event_output#-47616 677: (05) goto pc-636 678: (62) *(u32 *)(r10 -64) = 0 679: (bf) r7 = r10 680: (07) r7 += -64 681: (05) goto pc-44 682: (05) goto pc-1 683: (05) goto pc-1
BPF insns afterwards:
[...] 618: (85) call bpf_perf_event_output#-47616 619: (05) goto pc-638 620: (62) *(u32 *)(r10 -64) = 0 621: (bf) r7 = r10 622: (07) r7 += -64 623: (05) goto pc-44
To illustrate the bug, situation looks as follows: ____ 0 | | <-- foo: [...] 1 |____| 2 |____| <-- pos / end_new ^ 3 | | | 4 | | | len 5 |____| | (remove region) 6 | | <-- end_old v 7 | | 8 | | <-- curr (jmp foo) 9 |____|
The condition curr >= end_new && curr + off + 1 < end_new in the branch delta adjustments is never hit because curr + off + 1 < end_new is compared as unsigned and therefore curr + off + 1 > end_new in unsigned realm as curr + off + 1 becomes negative since the insns are memmove()'d before the offset adjustments.
Correct BPF insns after this fix:
[...] 618: (85) call bpf_perf_event_output#-47216 619: (05) goto pc-578 620: (62) *(u32 *)(r10 -64) = 0 621: (bf) r7 = r10 622: (07) r7 += -64 623: (05) goto pc-44
Note that unprivileged case is not affected from this.
Fixes: 52875a04f4b2 ("bpf: verifier: remove dead code") Fixes: 2cbd95a5c4fb ("bpf: change parameters of call/branch offset adjustment") Reported-by: Frank Ch. Eigler fche@redhat.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -337,7 +337,7 @@ int bpf_prog_calc_tag(struct bpf_prog *f }
static int bpf_adj_delta_to_imm(struct bpf_insn *insn, u32 pos, s32 end_old, - s32 end_new, u32 curr, const bool probe_pass) + s32 end_new, s32 curr, const bool probe_pass) { const s64 imm_min = S32_MIN, imm_max = S32_MAX; s32 delta = end_new - end_old; @@ -355,7 +355,7 @@ static int bpf_adj_delta_to_imm(struct b }
static int bpf_adj_delta_to_off(struct bpf_insn *insn, u32 pos, s32 end_old, - s32 end_new, u32 curr, const bool probe_pass) + s32 end_new, s32 curr, const bool probe_pass) { const s32 off_min = S16_MIN, off_max = S16_MAX; s32 delta = end_new - end_old;
From: Gilad Ben-Yossef gilad@benyossef.com
commit c4b22bf51b815fb61a35a27fc847a88bc28ebb63 upstream.
We were handling chained scattergather lists with specialized code needlessly as the regular sg APIs handle them just fine. The code handling this also had an (unused) code path with a use-before-init error, flagged by Coverity.
Remove all special handling of chained sg and leave their handling to the regular sg APIs.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_buffer_mgr.c | 98 +++++++---------------------------- 1 file changed, 22 insertions(+), 76 deletions(-)
--- a/drivers/crypto/ccree/cc_buffer_mgr.c +++ b/drivers/crypto/ccree/cc_buffer_mgr.c @@ -83,24 +83,17 @@ static void cc_copy_mac(struct device *d */ static unsigned int cc_get_sgl_nents(struct device *dev, struct scatterlist *sg_list, - unsigned int nbytes, u32 *lbytes, - bool *is_chained) + unsigned int nbytes, u32 *lbytes) { unsigned int nents = 0;
while (nbytes && sg_list) { - if (sg_list->length) { - nents++; - /* get the number of bytes in the last entry */ - *lbytes = nbytes; - nbytes -= (sg_list->length > nbytes) ? - nbytes : sg_list->length; - sg_list = sg_next(sg_list); - } else { - sg_list = (struct scatterlist *)sg_page(sg_list); - if (is_chained) - *is_chained = true; - } + nents++; + /* get the number of bytes in the last entry */ + *lbytes = nbytes; + nbytes -= (sg_list->length > nbytes) ? + nbytes : sg_list->length; + sg_list = sg_next(sg_list); } dev_dbg(dev, "nents %d last bytes %d\n", nents, *lbytes); return nents; @@ -142,7 +135,7 @@ void cc_copy_sg_portion(struct device *d { u32 nents, lbytes;
- nents = cc_get_sgl_nents(dev, sg, end, &lbytes, NULL); + nents = cc_get_sgl_nents(dev, sg, end, &lbytes); sg_copy_buffer(sg, nents, (void *)dest, (end - to_skip + 1), to_skip, (direct == CC_SG_TO_BUF)); } @@ -314,40 +307,10 @@ static void cc_add_sg_entry(struct devic sgl_data->num_of_buffers++; }
-static int cc_dma_map_sg(struct device *dev, struct scatterlist *sg, u32 nents, - enum dma_data_direction direction) -{ - u32 i, j; - struct scatterlist *l_sg = sg; - - for (i = 0; i < nents; i++) { - if (!l_sg) - break; - if (dma_map_sg(dev, l_sg, 1, direction) != 1) { - dev_err(dev, "dma_map_page() sg buffer failed\n"); - goto err; - } - l_sg = sg_next(l_sg); - } - return nents; - -err: - /* Restore mapped parts */ - for (j = 0; j < i; j++) { - if (!sg) - break; - dma_unmap_sg(dev, sg, 1, direction); - sg = sg_next(sg); - } - return 0; -} - static int cc_map_sg(struct device *dev, struct scatterlist *sg, unsigned int nbytes, int direction, u32 *nents, u32 max_sg_nents, u32 *lbytes, u32 *mapped_nents) { - bool is_chained = false; - if (sg_is_last(sg)) { /* One entry only case -set to DLLI */ if (dma_map_sg(dev, sg, 1, direction) != 1) { @@ -361,35 +324,21 @@ static int cc_map_sg(struct device *dev, *nents = 1; *mapped_nents = 1; } else { /*sg_is_last*/ - *nents = cc_get_sgl_nents(dev, sg, nbytes, lbytes, - &is_chained); + *nents = cc_get_sgl_nents(dev, sg, nbytes, lbytes); if (*nents > max_sg_nents) { *nents = 0; dev_err(dev, "Too many fragments. current %d max %d\n", *nents, max_sg_nents); return -ENOMEM; } - if (!is_chained) { - /* In case of mmu the number of mapped nents might - * be changed from the original sgl nents - */ - *mapped_nents = dma_map_sg(dev, sg, *nents, direction); - if (*mapped_nents == 0) { - *nents = 0; - dev_err(dev, "dma_map_sg() sg buffer failed\n"); - return -ENOMEM; - } - } else { - /*In this case the driver maps entry by entry so it - * must have the same nents before and after map - */ - *mapped_nents = cc_dma_map_sg(dev, sg, *nents, - direction); - if (*mapped_nents != *nents) { - *nents = *mapped_nents; - dev_err(dev, "dma_map_sg() sg buffer failed\n"); - return -ENOMEM; - } + /* In case of mmu the number of mapped nents might + * be changed from the original sgl nents + */ + *mapped_nents = dma_map_sg(dev, sg, *nents, direction); + if (*mapped_nents == 0) { + *nents = 0; + dev_err(dev, "dma_map_sg() sg buffer failed\n"); + return -ENOMEM; } }
@@ -571,7 +520,6 @@ void cc_unmap_aead_request(struct device struct crypto_aead *tfm = crypto_aead_reqtfm(req); struct cc_drvdata *drvdata = dev_get_drvdata(dev); u32 dummy; - bool chained; u32 size_to_unmap = 0;
if (areq_ctx->mac_buf_dma_addr) { @@ -636,15 +584,14 @@ void cc_unmap_aead_request(struct device size_to_unmap += crypto_aead_ivsize(tfm);
dma_unmap_sg(dev, req->src, - cc_get_sgl_nents(dev, req->src, size_to_unmap, - &dummy, &chained), + cc_get_sgl_nents(dev, req->src, size_to_unmap, &dummy), DMA_BIDIRECTIONAL); if (req->src != req->dst) { dev_dbg(dev, "Unmapping dst sgl: req->dst=%pK\n", sg_virt(req->dst)); dma_unmap_sg(dev, req->dst, cc_get_sgl_nents(dev, req->dst, size_to_unmap, - &dummy, &chained), + &dummy), DMA_BIDIRECTIONAL); } if (drvdata->coherent && @@ -1022,7 +969,6 @@ static int cc_aead_chain_data(struct cc_ unsigned int size_for_map = req->assoclen + req->cryptlen; struct crypto_aead *tfm = crypto_aead_reqtfm(req); u32 sg_index = 0; - bool chained = false; bool is_gcm4543 = areq_ctx->is_gcm4543; u32 size_to_skip = req->assoclen;
@@ -1043,7 +989,7 @@ static int cc_aead_chain_data(struct cc_ size_for_map += (direct == DRV_CRYPTO_DIRECTION_ENCRYPT) ? authsize : 0; src_mapped_nents = cc_get_sgl_nents(dev, req->src, size_for_map, - &src_last_bytes, &chained); + &src_last_bytes); sg_index = areq_ctx->src_sgl->length; //check where the data starts while (sg_index <= size_to_skip) { @@ -1083,7 +1029,7 @@ static int cc_aead_chain_data(struct cc_ }
dst_mapped_nents = cc_get_sgl_nents(dev, req->dst, size_for_map, - &dst_last_bytes, &chained); + &dst_last_bytes); sg_index = areq_ctx->dst_sgl->length; offset = size_to_skip;
@@ -1484,7 +1430,7 @@ int cc_map_hash_request_update(struct cc dev_dbg(dev, " less than one block: curr_buff=%pK *curr_buff_cnt=0x%X copy_to=%pK\n", curr_buff, *curr_buff_cnt, &curr_buff[*curr_buff_cnt]); areq_ctx->in_nents = - cc_get_sgl_nents(dev, src, nbytes, &dummy, NULL); + cc_get_sgl_nents(dev, src, nbytes, &dummy); sg_copy_to_buffer(src, areq_ctx->in_nents, &curr_buff[*curr_buff_cnt], nbytes); *curr_buff_cnt += nbytes;
From: Gilad Ben-Yossef gilad@benyossef.com
commit d574b707c873d6ef1a2a155f8cfcfecd821e9a2e upstream.
Fix a memory leak on the error path of IV generation code.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_ivgen.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
--- a/drivers/crypto/ccree/cc_ivgen.c +++ b/drivers/crypto/ccree/cc_ivgen.c @@ -154,9 +154,6 @@ void cc_ivgen_fini(struct cc_drvdata *dr }
ivgen_ctx->pool = NULL_SRAM_ADDR; - - /* release "this" context */ - kfree(ivgen_ctx); }
/*! @@ -174,10 +171,12 @@ int cc_ivgen_init(struct cc_drvdata *drv int rc;
/* Allocate "this" context */ - ivgen_ctx = kzalloc(sizeof(*ivgen_ctx), GFP_KERNEL); + ivgen_ctx = devm_kzalloc(device, sizeof(*ivgen_ctx), GFP_KERNEL); if (!ivgen_ctx) return -ENOMEM;
+ drvdata->ivgen_handle = ivgen_ctx; + /* Allocate pool's header for initial enc. key/IV */ ivgen_ctx->pool_meta = dma_alloc_coherent(device, CC_IVPOOL_META_SIZE, &ivgen_ctx->pool_meta_dma, @@ -196,8 +195,6 @@ int cc_ivgen_init(struct cc_drvdata *drv goto out; }
- drvdata->ivgen_handle = ivgen_ctx; - return cc_init_iv_sram(drvdata);
out:
From: Gilad Ben-Yossef gilad@benyossef.com
commit 874e163759f27e0a9988c5d1f4605e3f25564fd2 upstream.
The MAC hash key might be passed to us on stack. Copy it to a slab buffer before mapping to gurantee proper DMA mapping.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_hash.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-)
--- a/drivers/crypto/ccree/cc_hash.c +++ b/drivers/crypto/ccree/cc_hash.c @@ -69,6 +69,7 @@ struct cc_hash_alg { struct hash_key_req_ctx { u32 keylen; dma_addr_t key_dma_addr; + u8 *key; };
/* hash per-session context */ @@ -730,13 +731,20 @@ static int cc_hash_setkey(struct crypto_ ctx->key_params.keylen = keylen; ctx->key_params.key_dma_addr = 0; ctx->is_hmac = true; + ctx->key_params.key = NULL;
if (keylen) { + ctx->key_params.key = kmemdup(key, keylen, GFP_KERNEL); + if (!ctx->key_params.key) + return -ENOMEM; + ctx->key_params.key_dma_addr = - dma_map_single(dev, (void *)key, keylen, DMA_TO_DEVICE); + dma_map_single(dev, (void *)ctx->key_params.key, keylen, + DMA_TO_DEVICE); if (dma_mapping_error(dev, ctx->key_params.key_dma_addr)) { dev_err(dev, "Mapping key va=0x%p len=%u for DMA failed\n", - key, keylen); + ctx->key_params.key, keylen); + kzfree(ctx->key_params.key); return -ENOMEM; } dev_dbg(dev, "mapping key-buffer: key_dma_addr=%pad keylen=%u\n", @@ -887,6 +895,9 @@ out: dev_dbg(dev, "Unmapped key-buffer: key_dma_addr=%pad keylen=%u\n", &ctx->key_params.key_dma_addr, ctx->key_params.keylen); } + + kzfree(ctx->key_params.key); + return rc; }
@@ -913,11 +924,16 @@ static int cc_xcbc_setkey(struct crypto_
ctx->key_params.keylen = keylen;
+ ctx->key_params.key = kmemdup(key, keylen, GFP_KERNEL); + if (!ctx->key_params.key) + return -ENOMEM; + ctx->key_params.key_dma_addr = - dma_map_single(dev, (void *)key, keylen, DMA_TO_DEVICE); + dma_map_single(dev, ctx->key_params.key, keylen, DMA_TO_DEVICE); if (dma_mapping_error(dev, ctx->key_params.key_dma_addr)) { dev_err(dev, "Mapping key va=0x%p len=%u for DMA failed\n", key, keylen); + kzfree(ctx->key_params.key); return -ENOMEM; } dev_dbg(dev, "mapping key-buffer: key_dma_addr=%pad keylen=%u\n", @@ -969,6 +985,8 @@ static int cc_xcbc_setkey(struct crypto_ dev_dbg(dev, "Unmapped key-buffer: key_dma_addr=%pad keylen=%u\n", &ctx->key_params.key_dma_addr, ctx->key_params.keylen);
+ kzfree(ctx->key_params.key); + return rc; }
From: Gilad Ben-Yossef gilad@benyossef.com
commit f3df82b468f00cca241d96ee3697c9a5e7fb6bd0 upstream.
We were computing the size of the import buffer based on the digest size but the 318 and 224 byte variants use 512 and 256 bytes internal state sizes respectfully, thus causing the import buffer to overrun.
Fix it by using the right sizes.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_hash.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/crypto/ccree/cc_hash.c +++ b/drivers/crypto/ccree/cc_hash.c @@ -1639,7 +1639,7 @@ static struct cc_hash_template driver_ha .setkey = cc_hash_setkey, .halg = { .digestsize = SHA224_DIGEST_SIZE, - .statesize = CC_STATE_SIZE(SHA224_DIGEST_SIZE), + .statesize = CC_STATE_SIZE(SHA256_DIGEST_SIZE), }, }, .hash_mode = DRV_HASH_SHA224, @@ -1666,7 +1666,7 @@ static struct cc_hash_template driver_ha .setkey = cc_hash_setkey, .halg = { .digestsize = SHA384_DIGEST_SIZE, - .statesize = CC_STATE_SIZE(SHA384_DIGEST_SIZE), + .statesize = CC_STATE_SIZE(SHA512_DIGEST_SIZE), }, }, .hash_mode = DRV_HASH_SHA384,
From: Gilad Ben-Yossef gilad@benyossef.com
commit e8662a6a5f8f7f2cadc0edb934aef622d96ac3ee upstream.
The AEAD authenc key and IVs might be passed to us on stack. Copy it to a slab buffer before mapping to gurantee proper DMA mapping.
Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_aead.c | 11 ++++++++++- drivers/crypto/ccree/cc_buffer_mgr.c | 15 ++++++++++++--- drivers/crypto/ccree/cc_driver.h | 1 + 3 files changed, 23 insertions(+), 4 deletions(-)
--- a/drivers/crypto/ccree/cc_aead.c +++ b/drivers/crypto/ccree/cc_aead.c @@ -424,7 +424,7 @@ static int validate_keys_sizes(struct cc /* This function prepers the user key so it can pass to the hmac processing * (copy to intenral buffer or hash in case of key longer than block */ -static int cc_get_plain_hmac_key(struct crypto_aead *tfm, const u8 *key, +static int cc_get_plain_hmac_key(struct crypto_aead *tfm, const u8 *authkey, unsigned int keylen) { dma_addr_t key_dma_addr = 0; @@ -437,6 +437,7 @@ static int cc_get_plain_hmac_key(struct unsigned int hashmode; unsigned int idx = 0; int rc = 0; + u8 *key = NULL; struct cc_hw_desc desc[MAX_AEAD_SETKEY_SEQ]; dma_addr_t padded_authkey_dma_addr = ctx->auth_state.hmac.padded_authkey_dma_addr; @@ -455,11 +456,17 @@ static int cc_get_plain_hmac_key(struct }
if (keylen != 0) { + + key = kmemdup(authkey, keylen, GFP_KERNEL); + if (!key) + return -ENOMEM; + key_dma_addr = dma_map_single(dev, (void *)key, keylen, DMA_TO_DEVICE); if (dma_mapping_error(dev, key_dma_addr)) { dev_err(dev, "Mapping key va=0x%p len=%u for DMA failed\n", key, keylen); + kzfree(key); return -ENOMEM; } if (keylen > blocksize) { @@ -542,6 +549,8 @@ static int cc_get_plain_hmac_key(struct if (key_dma_addr) dma_unmap_single(dev, key_dma_addr, keylen, DMA_TO_DEVICE);
+ kzfree(key); + return rc; }
--- a/drivers/crypto/ccree/cc_buffer_mgr.c +++ b/drivers/crypto/ccree/cc_buffer_mgr.c @@ -560,6 +560,7 @@ void cc_unmap_aead_request(struct device if (areq_ctx->gen_ctx.iv_dma_addr) { dma_unmap_single(dev, areq_ctx->gen_ctx.iv_dma_addr, hw_iv_size, DMA_BIDIRECTIONAL); + kzfree(areq_ctx->gen_ctx.iv); }
/* Release pool */ @@ -664,19 +665,27 @@ static int cc_aead_chain_iv(struct cc_dr struct aead_req_ctx *areq_ctx = aead_request_ctx(req); unsigned int hw_iv_size = areq_ctx->hw_iv_size; struct device *dev = drvdata_to_dev(drvdata); + gfp_t flags = cc_gfp_flags(&req->base); int rc = 0;
if (!req->iv) { areq_ctx->gen_ctx.iv_dma_addr = 0; + areq_ctx->gen_ctx.iv = NULL; goto chain_iv_exit; }
- areq_ctx->gen_ctx.iv_dma_addr = dma_map_single(dev, req->iv, - hw_iv_size, - DMA_BIDIRECTIONAL); + areq_ctx->gen_ctx.iv = kmemdup(req->iv, hw_iv_size, flags); + if (!areq_ctx->gen_ctx.iv) + return -ENOMEM; + + areq_ctx->gen_ctx.iv_dma_addr = + dma_map_single(dev, areq_ctx->gen_ctx.iv, hw_iv_size, + DMA_BIDIRECTIONAL); if (dma_mapping_error(dev, areq_ctx->gen_ctx.iv_dma_addr)) { dev_err(dev, "Mapping iv %u B at va=%pK for DMA failed\n", hw_iv_size, req->iv); + kzfree(areq_ctx->gen_ctx.iv); + areq_ctx->gen_ctx.iv = NULL; rc = -ENOMEM; goto chain_iv_exit; } --- a/drivers/crypto/ccree/cc_driver.h +++ b/drivers/crypto/ccree/cc_driver.h @@ -168,6 +168,7 @@ struct cc_alg_template {
struct async_gen_req_ctx { dma_addr_t iv_dma_addr; + u8 *iv; enum drv_crypto_direction op_type; };
From: Ofir Drang ofir.drang@arm.com
commit 7766dd774d80463cec7b81d90c8672af91de2da1 upstream.
On power management resume function first enable the device clk source to allow access to the device registers.
Signed-off-by: Ofir Drang ofir.drang@arm.com Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_pm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/crypto/ccree/cc_pm.c +++ b/drivers/crypto/ccree/cc_pm.c @@ -42,14 +42,15 @@ int cc_pm_resume(struct device *dev) struct cc_drvdata *drvdata = dev_get_drvdata(dev);
dev_dbg(dev, "unset HOST_POWER_DOWN_EN\n"); - cc_iowrite(drvdata, CC_REG(HOST_POWER_DOWN_EN), POWER_DOWN_DISABLE); - + /* Enables the device source clk */ rc = cc_clk_on(drvdata); if (rc) { dev_err(dev, "failed getting clock back on. We're toast.\n"); return rc; }
+ cc_iowrite(drvdata, CC_REG(HOST_POWER_DOWN_EN), POWER_DOWN_DISABLE); + rc = init_cc_regs(drvdata, false); if (rc) { dev_err(dev, "init_cc_regs (%x)\n", rc);
From: Ofir Drang ofir.drang@arm.com
commit 3499efbeed39d114873267683b9e776bcb34b058 upstream.
During power management suspend the driver need to prepare the device for the power down operation and as a last indication write to the HOST_POWER_DOWN_EN register which signals to the hardware that The ccree is ready for power down.
Signed-off-by: Ofir Drang ofir.drang@arm.com Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_pm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/crypto/ccree/cc_pm.c +++ b/drivers/crypto/ccree/cc_pm.c @@ -25,13 +25,13 @@ int cc_pm_suspend(struct device *dev) int rc;
dev_dbg(dev, "set HOST_POWER_DOWN_EN\n"); - cc_iowrite(drvdata, CC_REG(HOST_POWER_DOWN_EN), POWER_DOWN_ENABLE); rc = cc_suspend_req_queue(drvdata); if (rc) { dev_err(dev, "cc_suspend_req_queue (%x)\n", rc); return rc; } fini_cc_regs(drvdata); + cc_iowrite(drvdata, CC_REG(HOST_POWER_DOWN_EN), POWER_DOWN_ENABLE); cc_clk_off(drvdata); return 0; }
From: Ofir Drang ofir.drang@arm.com
commit 897ab2316910a66bb048f1c9cefa25e6a592dcd7 upstream.
Adds function that checks if cryptocell tee fips error occurred and in such case triggers system error through kernel panic. Change fips function to use this new routine.
Signed-off-by: Ofir Drang ofir.drang@arm.com Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_fips.c | 23 +++++++++++++++-------- drivers/crypto/ccree/cc_fips.h | 2 ++ 2 files changed, 17 insertions(+), 8 deletions(-)
--- a/drivers/crypto/ccree/cc_fips.c +++ b/drivers/crypto/ccree/cc_fips.c @@ -72,20 +72,28 @@ static inline void tee_fips_error(struct dev_err(dev, "TEE reported error!\n"); }
+/* + * This function check if cryptocell tee fips error occurred + * and in such case triggers system error + */ +void cc_tee_handle_fips_error(struct cc_drvdata *p_drvdata) +{ + struct device *dev = drvdata_to_dev(p_drvdata); + + if (!cc_get_tee_fips_status(p_drvdata)) + tee_fips_error(dev); +} + /* Deferred service handler, run as interrupt-fired tasklet */ static void fips_dsr(unsigned long devarg) { struct cc_drvdata *drvdata = (struct cc_drvdata *)devarg; - struct device *dev = drvdata_to_dev(drvdata); - u32 irq, state, val; + u32 irq, val;
irq = (drvdata->irq & (CC_GPR0_IRQ_MASK));
if (irq) { - state = cc_ioread(drvdata, CC_REG(GPR_HOST)); - - if (state != (CC_FIPS_SYNC_TEE_STATUS | CC_FIPS_SYNC_MODULE_OK)) - tee_fips_error(dev); + cc_tee_handle_fips_error(drvdata); }
/* after verifing that there is nothing to do, @@ -113,8 +121,7 @@ int cc_fips_init(struct cc_drvdata *p_dr dev_dbg(dev, "Initializing fips tasklet\n"); tasklet_init(&fips_h->tasklet, fips_dsr, (unsigned long)p_drvdata);
- if (!cc_get_tee_fips_status(p_drvdata)) - tee_fips_error(dev); + cc_tee_handle_fips_error(p_drvdata);
return 0; } --- a/drivers/crypto/ccree/cc_fips.h +++ b/drivers/crypto/ccree/cc_fips.h @@ -18,6 +18,7 @@ int cc_fips_init(struct cc_drvdata *p_dr void cc_fips_fini(struct cc_drvdata *drvdata); void fips_handler(struct cc_drvdata *drvdata); void cc_set_ree_fips_status(struct cc_drvdata *drvdata, bool ok); +void cc_tee_handle_fips_error(struct cc_drvdata *p_drvdata);
#else /* CONFIG_CRYPTO_FIPS */
@@ -30,6 +31,7 @@ static inline void cc_fips_fini(struct c static inline void cc_set_ree_fips_status(struct cc_drvdata *drvdata, bool ok) {} static inline void fips_handler(struct cc_drvdata *drvdata) {} +static inline void cc_tee_handle_fips_error(struct cc_drvdata *p_drvdata) {}
#endif /* CONFIG_CRYPTO_FIPS */
From: Ofir Drang ofir.drang@arm.com
commit 7138377ce10455b7183c6dde4b2c51b33f464c45 upstream.
in order to support cryptocell tee fips error that may occurs while cryptocell ree is suspended, an cc_tee_handle_fips_error call added to the cc_pm_resume function.
Signed-off-by: Ofir Drang ofir.drang@arm.com Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/ccree/cc_pm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/crypto/ccree/cc_pm.c +++ b/drivers/crypto/ccree/cc_pm.c @@ -11,6 +11,7 @@ #include "cc_ivgen.h" #include "cc_hash.h" #include "cc_pm.h" +#include "cc_fips.h"
#define POWER_DOWN_ENABLE 0x01 #define POWER_DOWN_DISABLE 0x00 @@ -50,12 +51,13 @@ int cc_pm_resume(struct device *dev) }
cc_iowrite(drvdata, CC_REG(HOST_POWER_DOWN_EN), POWER_DOWN_DISABLE); - rc = init_cc_regs(drvdata, false); if (rc) { dev_err(dev, "init_cc_regs (%x)\n", rc); return rc; } + /* check if tee fips error occurred during power down */ + cc_tee_handle_fips_error(drvdata);
rc = cc_resume_req_queue(drvdata); if (rc) {
From: Jiri Kosina jkosina@suse.cz
commit 134fca9063ad4851de767d1768180e5dede9a881 upstream.
The semantics of what mincore() considers to be resident is not completely clear, but Linux has always (since 2.3.52, which is when mincore() was initially done) treated it as "page is available in page cache".
That's potentially a problem, as that [in]directly exposes meta-information about pagecache / memory mapping state even about memory not strictly belonging to the process executing the syscall, opening possibilities for sidechannel attacks.
Change the semantics of mincore() so that it only reveals pagecache information for non-anonymous mappings that belog to files that the calling process could (if it tried to) successfully open for writing; otherwise we'd be including shared non-exclusive mappings, which
- is the sidechannel
- is not the usecase for mincore(), as that's primarily used for data, not (shared) text
[jkosina@suse.cz: v2] Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz [mhocko@suse.com: restructure can_do_mincore() conditions] Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pm Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Josh Snyder joshs@netflix.com Acked-by: Michal Hocko mhocko@suse.com Originally-by: Linus Torvalds torvalds@linux-foundation.org Originally-by: Dominique Martinet asmadeus@codewreck.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Chinner david@fromorbit.com Cc: Kevin Easton kevin@guarana.org Cc: Matthew Wilcox willy@infradead.org Cc: Cyril Hrubis chrubis@suse.cz Cc: Tejun Heo tj@kernel.org Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Daniel Gruss daniel@gruss.cc Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/mincore.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
--- a/mm/mincore.c +++ b/mm/mincore.c @@ -169,6 +169,22 @@ out: return 0; }
+static inline bool can_do_mincore(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + /* + * Reveal pagecache information only for non-anonymous mappings that + * correspond to the files the calling process could (if tried) open + * for writing; otherwise we'd be including shared non-exclusive + * mappings, which opens a side channel. + */ + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + /* * Do a chunk of "sys_mincore()". We've already checked * all the arguments, we hold the mmap semaphore: we should @@ -189,8 +205,13 @@ static long do_mincore(unsigned long add vma = find_vma(current->mm, addr); if (!vma || addr < vma->vm_start) return -ENOMEM; - mincore_walk.mm = vma->vm_mm; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); + if (!can_do_mincore(vma)) { + unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE); + memset(vec, 1, pages); + return pages; + } + mincore_walk.mm = vma->vm_mm; err = walk_page_range(addr, end, &mincore_walk); if (err < 0) return err;
From: Dan Williams dan.j.williams@intel.com
commit fce86ff5802bac3a7b19db171aa1949ef9caac31 upstream.
Starting with c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls pmdp_set_access_flags(). That helper enforces a pmd aligned @address argument via VM_BUG_ON() assertion.
Update the implementation to take a 'struct vm_fault' argument directly and apply the address alignment fixup internally to fix crash signatures like:
kernel BUG at arch/x86/mm/pgtable.c:515! invalid opcode: 0000 [#1] SMP NOPTI CPU: 51 PID: 43713 Comm: java Tainted: G OE 4.19.35 #1 [..] RIP: 0010:pmdp_set_access_flags+0x48/0x50 [..] Call Trace: vmf_insert_pfn_pmd+0x198/0x350 dax_iomap_fault+0xe82/0x1190 ext4_dax_huge_fault+0x103/0x1f0 ? __switch_to_asm+0x40/0x70 __handle_mm_fault+0x3f6/0x1370 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 handle_mm_fault+0xda/0x200 __do_page_fault+0x249/0x4f0 do_page_fault+0x32/0x110 ? page_fault+0x8/0x30 page_fault+0x1e/0x30
Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwil... Fixes: c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()") Signed-off-by: Dan Williams dan.j.williams@intel.com Reported-by: Piotr Balcer piotr.balcer@intel.com Tested-by: Yan Ma yan.ma@intel.com Tested-by: Pankaj Gupta pagupta@redhat.com Reviewed-by: Matthew Wilcox willy@infradead.org Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Chandan Rajendra chandan@linux.ibm.com Cc: Souptick Joarder jrdr.linux@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dax/device.c | 6 ++---- fs/dax.c | 6 ++---- include/linux/huge_mm.h | 6 ++---- mm/huge_memory.c | 16 ++++++++++------ 4 files changed, 16 insertions(+), 18 deletions(-)
--- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -184,8 +184,7 @@ static vm_fault_t __dev_dax_pmd_fault(st
*pfn = phys_to_pfn_t(phys, dax_region->pfn_flags);
- return vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd, *pfn, - vmf->flags & FAULT_FLAG_WRITE); + return vmf_insert_pfn_pmd(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE); }
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD @@ -235,8 +234,7 @@ static vm_fault_t __dev_dax_pud_fault(st
*pfn = phys_to_pfn_t(phys, dax_region->pfn_flags);
- return vmf_insert_pfn_pud(vmf->vma, vmf->address, vmf->pud, *pfn, - vmf->flags & FAULT_FLAG_WRITE); + return vmf_insert_pfn_pud(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE); } #else static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, --- a/fs/dax.c +++ b/fs/dax.c @@ -1575,8 +1575,7 @@ static vm_fault_t dax_iomap_pmd_fault(st }
trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry); - result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn, - write); + result = vmf_insert_pfn_pmd(vmf, pfn, write); break; case IOMAP_UNWRITTEN: case IOMAP_HOLE: @@ -1686,8 +1685,7 @@ dax_insert_pfn_mkwrite(struct vm_fault * ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn); #ifdef CONFIG_FS_DAX_PMD else if (order == PMD_ORDER) - ret = vmf_insert_pfn_pmd(vmf->vma, vmf->address, vmf->pmd, - pfn, true); + ret = vmf_insert_pfn_pmd(vmf, pfn, FAULT_FLAG_WRITE); #endif else ret = VM_FAULT_FALLBACK; --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -47,10 +47,8 @@ extern bool move_huge_pmd(struct vm_area extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, int prot_numa); -vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, pfn_t pfn, bool write); -vm_fault_t vmf_insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, pfn_t pfn, bool write); +vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); +vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_FLAG, TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -793,11 +793,13 @@ out_unlock: pte_free(mm, pgtable); }
-vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, pfn_t pfn, bool write) +vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write) { + unsigned long addr = vmf->address & PMD_MASK; + struct vm_area_struct *vma = vmf->vma; pgprot_t pgprot = vma->vm_page_prot; pgtable_t pgtable = NULL; + /* * If we had pmd_special, we could avoid all these restrictions, * but we need to be consistent with PTEs and architectures that @@ -820,7 +822,7 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_
track_pfn_insert(vma, &pgprot, pfn);
- insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write, pgtable); + insert_pfn_pmd(vma, addr, vmf->pmd, pfn, pgprot, write, pgtable); return VM_FAULT_NOPAGE; } EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd); @@ -869,10 +871,12 @@ out_unlock: spin_unlock(ptl); }
-vm_fault_t vmf_insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, pfn_t pfn, bool write) +vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write) { + unsigned long addr = vmf->address & PUD_MASK; + struct vm_area_struct *vma = vmf->vma; pgprot_t pgprot = vma->vm_page_prot; + /* * If we had pud_special, we could avoid all these restrictions, * but we need to be consistent with PTEs and architectures that @@ -889,7 +893,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_
track_pfn_insert(vma, &pgprot, pfn);
- insert_pfn_pud(vma, addr, pud, pfn, pgprot, write); + insert_pfn_pud(vma, addr, vmf->pud, pfn, pgprot, write); return VM_FAULT_NOPAGE; } EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud);
From: Kai Shen shenkai8@huawei.com
commit 2bf753e64b4a702e27ce26ff520c59563c62f96b upstream.
spinlock recursion happened when do LTP test: #!/bin/bash ./runltp -p -f hugetlb & ./runltp -p -f hugetlb & ./runltp -p -f hugetlb & ./runltp -p -f hugetlb & ./runltp -p -f hugetlb &
The dtor returned by get_compound_page_dtor in __put_compound_page may be the function of free_huge_page which will lock the hugetlb_lock, so don't put_page in lock of hugetlb_lock.
BUG: spinlock recursion on CPU#0, hugemmap05/1079 lock: hugetlb_lock+0x0/0x18, .magic: dead4ead, .owner: hugemmap05/1079, .owner_cpu: 0 Call trace: dump_backtrace+0x0/0x198 show_stack+0x24/0x30 dump_stack+0xa4/0xcc spin_dump+0x84/0xa8 do_raw_spin_lock+0xd0/0x108 _raw_spin_lock+0x20/0x30 free_huge_page+0x9c/0x260 __put_compound_page+0x44/0x50 __put_page+0x2c/0x60 alloc_surplus_huge_page.constprop.19+0xf0/0x140 hugetlb_acct_memory+0x104/0x378 hugetlb_reserve_pages+0xe0/0x250 hugetlbfs_file_mmap+0xc0/0x140 mmap_region+0x3e8/0x5b0 do_mmap+0x280/0x460 vm_mmap_pgoff+0xf4/0x128 ksys_mmap_pgoff+0xb4/0x258 __arm64_sys_mmap+0x34/0x48 el0_svc_common+0x78/0x130 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc
Link: http://lkml.kernel.org/r/b8ade452-2d6b-0372-32c2-703644032b47@huawei.com Fixes: 9980d744a0 ("mm, hugetlb: get rid of surplus page accounting tricks") Signed-off-by: Kai Shen shenkai8@huawei.com Signed-off-by: Feilong Lin linfeilong@huawei.com Reported-by: Wang Wang wangwang2@huawei.com Reviewed-by: Oscar Salvador osalvador@suse.de Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Acked-by: Michal Hocko mhocko@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/hugetlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1574,8 +1574,9 @@ static struct page *alloc_surplus_huge_p */ if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { SetPageHugeTemporary(page); + spin_unlock(&hugetlb_lock); put_page(page); - page = NULL; + return NULL; } else { h->surplus_huge_pages++; h->surplus_huge_pages_node[page_to_nid(page)]++;
From: Mike Kravetz mike.kravetz@oracle.com
commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream.
hugetlb uses a fault mutex hash table to prevent page faults of the same pages concurrently. The key for shared and private mappings is different. Shared keys off address_space and file index. Private keys off mm and virtual address. Consider a private mappings of a populated hugetlbfs file. A fault will map the page from the file and if needed do a COW to map a writable page.
Hugetlbfs hole punch uses the fault mutex to prevent mappings of file pages. It uses the address_space file index key. However, private mappings will use a different key and could race with this code to map the file page. This causes problems (BUG) for the page cache remove code as it expects the page to be unmapped. A sample stack is:
page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) kernel BUG at mm/filemap.c:169! ... RIP: 0010:unaccount_page_cache_page+0x1b8/0x200 ... Call Trace: __delete_from_page_cache+0x39/0x220 delete_from_page_cache+0x45/0x70 remove_inode_hugepages+0x13c/0x380 ? __add_to_page_cache_locked+0x162/0x380 hugetlbfs_fallocate+0x403/0x540 ? _cond_resched+0x15/0x30 ? __inode_security_revalidate+0x5d/0x70 ? selinux_file_permission+0x100/0x130 vfs_fallocate+0x13f/0x270 ksys_fallocate+0x3c/0x80 __x64_sys_fallocate+0x1a/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9
There seems to be another potential COW issue/race with this approach of different private and shared keys as noted in commit 8382d914ebf7 ("mm, hugetlb: improve page-fault scalability").
Since every hugetlb mapping (even anon and private) is actually a file mapping, just use the address_space index key for all mappings. This results in potentially more hash collisions. However, this should not be the common case.
Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5 Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Reviewed-by: Davidlohr Bueso dbueso@suse.de Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/hugetlbfs/inode.c | 7 ++----- include/linux/hugetlb.h | 4 +--- mm/hugetlb.c | 22 ++++++---------------- mm/userfaultfd.c | 3 +-- 4 files changed, 10 insertions(+), 26 deletions(-)
--- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -440,9 +440,7 @@ static void remove_inode_hugepages(struc u32 hash;
index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, - &pseudo_vma, - mapping, index, 0); + hash = hugetlb_fault_mutex_hash(h, mapping, index, 0); mutex_lock(&hugetlb_fault_mutex_table[hash]);
/* @@ -639,8 +637,7 @@ static long hugetlbfs_fallocate(struct f addr = index * hpage_size;
/* mutex taken here, fault path and hole punch */ - hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, - index, addr); + hash = hugetlb_fault_mutex_hash(h, mapping, index, addr); mutex_lock(&hugetlb_fault_mutex_table[hash]);
/* See if already present in mapping to avoid alloc/free */ --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -123,9 +123,7 @@ void move_hugetlb_state(struct page *old void free_huge_page(struct page *page); void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address);
pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3778,8 +3778,7 @@ retry: * handling userfault. Reacquire after handling * fault to make calling code simpler. */ - hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, - idx, haddr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, haddr); mutex_unlock(&hugetlb_fault_mutex_table[hash]); ret = handle_userfault(&vmf, VM_UFFD_MISSING); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -3887,21 +3886,14 @@ backout_unlocked: }
#ifdef CONFIG_SMP -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address) { unsigned long key[2]; u32 hash;
- if (vma->vm_flags & VM_SHARED) { - key[0] = (unsigned long) mapping; - key[1] = idx; - } else { - key[0] = (unsigned long) mm; - key[1] = address >> huge_page_shift(h); - } + key[0] = (unsigned long) mapping; + key[1] = idx;
hash = jhash2((u32 *)&key, sizeof(key)/sizeof(u32), 0);
@@ -3912,9 +3904,7 @@ u32 hugetlb_fault_mutex_hash(struct hsta * For uniprocesor systems we always use a single mutex, so just * return 0 and avoid the hashing overhead. */ -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address) { return 0; @@ -3959,7 +3949,7 @@ vm_fault_t hugetlb_fault(struct mm_struc * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ - hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, idx, haddr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, haddr); mutex_lock(&hugetlb_fault_mutex_table[hash]);
entry = huge_ptep_get(ptep); --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -271,8 +271,7 @@ retry: */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; - hash = hugetlb_fault_mutex_hash(h, dst_mm, dst_vma, mapping, - idx, dst_addr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, dst_addr); mutex_lock(&hugetlb_fault_mutex_table[hash]);
err = -ENOMEM;
From: Shuning Zhang sunny.s.zhang@oracle.com
commit e091eab028f9253eac5c04f9141bbc9d170acab3 upstream.
In some cases, ocfs2_iget() reads the data of inode, which has been deleted for some reason. That will make the system panic. So We should judge whether this inode has been deleted, and tell the caller that the inode is a bad inode.
For example, the ocfs2 is used as the backed of nfs, and the client is nfsv3. This issue can be reproduced by the following steps.
on the nfs server side, ..../patha/pathb
Step 1: The process A was scheduled before calling the function fh_verify.
Step 2: The process B is removing the 'pathb', and just completed the call to function dput. Then the dentry of 'pathb' has been deleted from the dcache, and all ancestors have been deleted also. The relationship of dentry and inode was deleted through the function hlist_del_init. The following is the call stack. dentry_iput->hlist_del_init(&dentry->d_u.d_alias)
At this time, the inode is still in the dcache.
Step 3: The process A call the function ocfs2_get_dentry, which get the inode from dcache. Then the refcount of inode is 1. The following is the call stack. nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)
Step 4: Dirty pages are flushed by bdi threads. So the inode of 'patha' is evicted, and this directory was deleted. But the inode of 'pathb' can't be evicted, because the refcount of the inode was 1.
Step 5: The process A keep running, and call the function reconnect_path(in exportfs_decode_fh), which call function ocfs2_get_parent of ocfs2. Get the block number of parent directory(patha) by the name of ... Then read the data from disk by the block number. But this inode has been deleted, so the system panic.
Process A Process B 1. in nfsd3_proc_getacl | 2. | dput 3. fh_to_dentry(ocfs2_get_dentry) | 4. bdi flush dirty cache | 5. ocfs2_iget |
[283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block: Invalid dinode #580640: OCFS2_VALID_FL not set
[283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced after error
[283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G W 4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2 [283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 [283465.549657] 0000000000000000 ffff8800a56fb7b8 ffffffff816e839c ffffffffa0514758 [283465.550392] 000000000008dc20 ffff8800a56fb838 ffffffff816e62d3 0000000000000008 [283465.551056] ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8 ffff88005df9f000 [283465.551710] Call Trace: [283465.552516] [<ffffffff816e839c>] dump_stack+0x63/0x81 [283465.553291] [<ffffffff816e62d3>] panic+0xcb/0x21b [283465.554037] [<ffffffffa04e66b0>] ocfs2_handle_error+0xf0/0xf0 [ocfs2] [283465.554882] [<ffffffffa04e7737>] __ocfs2_error+0x67/0x70 [ocfs2] [283465.555768] [<ffffffffa049c0f9>] ocfs2_validate_inode_block+0x229/0x230 [ocfs2] [283465.556683] [<ffffffffa047bcbc>] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2] [283465.557408] [<ffffffffa049bed0>] ? ocfs2_inode_cache_io_unlock+0x20/0x20 [ocfs2] [283465.557973] [<ffffffffa049f0eb>] ocfs2_read_inode_block_full+0x3b/0x60 [ocfs2] [283465.558525] [<ffffffffa049f5ba>] ocfs2_iget+0x4aa/0x880 [ocfs2] [283465.559082] [<ffffffffa049146e>] ocfs2_get_parent+0x9e/0x220 [ocfs2] [283465.559622] [<ffffffff81297c05>] reconnect_path+0xb5/0x300 [283465.560156] [<ffffffff81297f46>] exportfs_decode_fh+0xf6/0x2b0 [283465.560708] [<ffffffffa062faf0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd] [283465.561262] [<ffffffff810a8196>] ? prepare_creds+0x26/0x110 [283465.561932] [<ffffffffa0630860>] fh_verify+0x350/0x660 [nfsd] [283465.562862] [<ffffffffa0637804>] ? nfsd_cache_lookup+0x44/0x630 [nfsd] [283465.563697] [<ffffffffa063a8b9>] nfsd3_proc_getattr+0x69/0xf0 [nfsd] [283465.564510] [<ffffffffa062cf60>] nfsd_dispatch+0xe0/0x290 [nfsd] [283465.565358] [<ffffffffa05eb892>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc] [283465.566272] [<ffffffffa05ea652>] svc_process_common+0x412/0x6a0 [sunrpc] [283465.567155] [<ffffffffa05eaa03>] svc_process+0x123/0x210 [sunrpc] [283465.568020] [<ffffffffa062c90f>] nfsd+0xff/0x170 [nfsd] [283465.568962] [<ffffffffa062c810>] ? nfsd_destroy+0x80/0x80 [nfsd] [283465.570112] [<ffffffff810a622b>] kthread+0xcb/0xf0 [283465.571099] [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180 [283465.572114] [<ffffffff816f11b8>] ret_from_fork+0x58/0x90 [283465.573156] [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@orac... Signed-off-by: Shuning Zhang sunny.s.zhang@oracle.com Reviewed-by: Joseph Qi jiangqi903@gmail.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: piaojun piaojun@huawei.com Cc: "Gang He" ghe@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ocfs2/export.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/export.c +++ b/fs/ocfs2/export.c @@ -148,16 +148,24 @@ static struct dentry *ocfs2_get_parent(s u64 blkno; struct dentry *parent; struct inode *dir = d_inode(child); + int set;
trace_ocfs2_get_parent(child, child->d_name.len, child->d_name.name, (unsigned long long)OCFS2_I(dir)->ip_blkno);
+ status = ocfs2_nfs_sync_lock(OCFS2_SB(dir->i_sb), 1); + if (status < 0) { + mlog(ML_ERROR, "getting nfs sync lock(EX) failed %d\n", status); + parent = ERR_PTR(status); + goto bail; + } + status = ocfs2_inode_lock(dir, NULL, 0); if (status < 0) { if (status != -ENOENT) mlog_errno(status); parent = ERR_PTR(status); - goto bail; + goto unlock_nfs_sync; }
status = ocfs2_lookup_ino_from_name(dir, "..", 2, &blkno); @@ -166,11 +174,31 @@ static struct dentry *ocfs2_get_parent(s goto bail_unlock; }
+ status = ocfs2_test_inode_bit(OCFS2_SB(dir->i_sb), blkno, &set); + if (status < 0) { + if (status == -EINVAL) { + status = -ESTALE; + } else + mlog(ML_ERROR, "test inode bit failed %d\n", status); + parent = ERR_PTR(status); + goto bail_unlock; + } + + trace_ocfs2_get_dentry_test_bit(status, set); + if (!set) { + status = -ESTALE; + parent = ERR_PTR(status); + goto bail_unlock; + } + parent = d_obtain_alias(ocfs2_iget(OCFS2_SB(dir->i_sb), blkno, 0, 0));
bail_unlock: ocfs2_inode_unlock(dir, 0);
+unlock_nfs_sync: + ocfs2_nfs_sync_unlock(OCFS2_SB(dir->i_sb), 1); + bail: trace_ocfs2_get_parent_end(parent);
From: Andrea Arcangeli aarcange@redhat.com
commit c3f3ce049f7d97cc7ec9c01cb51d9ec74e0f37c2 upstream.
The task structure is freed while get_mem_cgroup_from_mm() holds rcu_read_lock() and dereferences mm->owner.
get_mem_cgroup_from_mm() failing fork() ---- --- task = mm->owner mm->owner = NULL; free(task) if (task) *task; /* use after free */
The fix consists in freeing the task with RCU also in the fork failure case, exactly like it always happens for the regular exit(2) path. That is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm() (left side above) effective to avoid a use after free when dereferencing the task structure.
An alternate possible fix would be to defer the delivery of the userfaultfd contexts to the monitor until after fork() is guaranteed to succeed. Such a change would require more changes because it would create a strict ordering dependency where the uffd methods would need to be called beyond the last potentially failing branch in order to be safe. This solution as opposed only adds the dependency to common code to set mm->owner to NULL and to free the task struct that was pointed by mm->owner with RCU, if fork ends up failing. The userfaultfd methods can still be called anywhere during the fork runtime and the monitor will keep discarding orphaned "mm" coming from failed forks in userland.
This race condition couldn't trigger if CONFIG_MEMCG was set =n at build time.
[aarcange@redhat.com: improve changelog, reduce #ifdefs per Michal] Link: http://lkml.kernel.org/r/20190429035752.4508-1-aarcange@redhat.com Link: http://lkml.kernel.org/r/20190325225636.11635-2-aarcange@redhat.com Fixes: 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event") Signed-off-by: Andrea Arcangeli aarcange@redhat.com Tested-by: zhong jiang zhongjiang@huawei.com Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com Cc: Oleg Nesterov oleg@redhat.com Cc: Jann Horn jannh@google.com Cc: Hugh Dickins hughd@google.com Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Peter Xu peterx@redhat.com Cc: Jason Gunthorpe jgg@mellanox.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: zhong jiang zhongjiang@huawei.com Cc: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/fork.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -952,6 +952,15 @@ static void mm_init_aio(struct mm_struct #endif }
+static __always_inline void mm_clear_owner(struct mm_struct *mm, + struct task_struct *p) +{ +#ifdef CONFIG_MEMCG + if (mm->owner == p) + WRITE_ONCE(mm->owner, NULL); +#endif +} + static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) { #ifdef CONFIG_MEMCG @@ -1331,6 +1340,7 @@ static struct mm_struct *dup_mm(struct t free_pt: /* don't put binfmt in mmput, we haven't got module yet */ mm->binfmt = NULL; + mm_init_owner(mm, NULL); mmput(mm);
fail_nomem: @@ -1662,6 +1672,21 @@ static inline void rcu_copy_process(stru #endif /* #ifdef CONFIG_TASKS_RCU */ }
+static void __delayed_free_task(struct rcu_head *rhp) +{ + struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); + + free_task(tsk); +} + +static __always_inline void delayed_free_task(struct task_struct *tsk) +{ + if (IS_ENABLED(CONFIG_MEMCG)) + call_rcu(&tsk->rcu, __delayed_free_task); + else + free_task(tsk); +} + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -2123,8 +2148,10 @@ bad_fork_cleanup_io: bad_fork_cleanup_namespaces: exit_task_namespaces(p); bad_fork_cleanup_mm: - if (p->mm) + if (p->mm) { + mm_clear_owner(p->mm, p); mmput(p->mm); + } bad_fork_cleanup_signal: if (!(clone_flags & CLONE_THREAD)) free_signal_struct(p->signal); @@ -2155,7 +2182,7 @@ bad_fork_cleanup_count: bad_fork_free: p->state = TASK_DEAD; put_task_stack(p); - free_task(p); + delayed_free_task(p); fork_out: spin_lock_irq(¤t->sighand->siglock); hlist_del_init(&delayed.node);
From: Rajat Jain rajatja@google.com
commit 2f844b61db8297a1f7a06adf2eb5c43381f2c183 upstream.
I noticed that recently multiple systems (chromebooks) couldn't wake from S0ix using LID or Keyboard after updating to a newer kernel. I bisected and it turned up commit f941d3e41da7 ("ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle"). I checked that the issue got fixed if that commit was reverted.
I debugged and found that although PNP0C0D:00 (representing the LID) is wake capable and should wakeup the system per the code in acpi_wakeup_gpe_init() and in drivers/acpi/button.c:
localhost /sys # cat /proc/acpi/wakeup Device S-state Status Sysfs node LID0 S4 *enabled platform:PNP0C0D:00 CREC S5 *disabled platform:GOOG0004:00 *disabled platform:cros-ec-dev.1.auto *disabled platform:cros-ec-accel.0 *disabled platform:cros-ec-accel.1 *disabled platform:cros-ec-gyro.0 *disabled platform:cros-ec-ring.0 *disabled platform:cros-usbpd-charger.2.auto *disabled platform:cros-usbpd-logger.3.auto D015 S3 *enabled i2c:i2c-ELAN0000:00 PENH S3 *enabled platform:PRP0001:00 XHCI S3 *enabled pci:0000:00:14.0 GLAN S4 *disabled WIFI S3 *disabled pci:0000:00:14.3 localhost /sys #
On debugging, I found that its corresponding GPE is not being enabled. The particular GPE's "gpe_register_info->enable_for_wake" does not have any bits set when acpi_enable_all_wakeup_gpes() comes around to use it. I looked at code and could not find any other code path that should set the bits in "enable_for_wake" bitmask for the wake enabled devices for s2idle. [I do see that it happens for S3 in acpi_sleep_prepare()].
Thus I used the same call to enable the GPEs for wake enabled devices, and verified that this fixes the regression I was seeing on multiple of my devices.
[ rjw: The problem is that commit f941d3e41da7 ("ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle") forgot to add the acpi_enable_wakeup_devices() call for s2idle along with acpi_enable_all_wakeup_gpes(). ]
Fixes: f941d3e41da7 ("ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle") Link: https://bugzilla.kernel.org/show_bug.cgi?id=203579 Signed-off-by: Rajat Jain rajatja@google.com [ rjw: Subject & changelog ] Cc: 5.0+ stable@vger.kernel.org # 5.0+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/sleep.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -977,6 +977,8 @@ static int acpi_s2idle_prepare(void) if (acpi_sci_irq_valid()) enable_irq_wake(acpi_sci_irq);
+ acpi_enable_wakeup_devices(ACPI_STATE_S0); + /* Change the configuration of GPEs to avoid spurious wakeup. */ acpi_enable_all_wakeup_gpes(); acpi_os_wait_events_complete(); @@ -1027,6 +1029,8 @@ static void acpi_s2idle_restore(void) { acpi_enable_all_runtime_gpes();
+ acpi_disable_wakeup_devices(ACPI_STATE_S0); + if (acpi_sci_irq_valid()) disable_irq_wake(acpi_sci_irq);
From: Erik Schmauss erik.schmauss@intel.com
commit 11207b4dc2737f1f01695979ad5eac6c8ecc8031 upstream.
ACPICA commit c14f17fa0acf8c93497ce04b9a7f4ada51b69383
This flag should not be included in #ifndef CONFIG_ACPI. It should be used unconditionally.
Link: https://github.com/acpica/acpica/commit/c14f17fa Fixes: aa9aaa4d61c0 ("ACPI: use different default debug value than ACPICA") Reported-by: Gabriel C nix.or.die@gmail.com Tested-by: Gabriel C nix.or.die@gmail.com Signed-off-by: Erik Schmauss erik.schmauss@intel.com Signed-off-by: Bob Moore robert.moore@intel.com Cc: 5.1+ stable@vger.kernel.org 5.1+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/acpi/platform/aclinux.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/include/acpi/platform/aclinux.h +++ b/include/acpi/platform/aclinux.h @@ -66,6 +66,11 @@
#define ACPI_INIT_FUNCTION __init
+/* Use a specific bugging default separate from ACPICA */ + +#undef ACPI_DEBUG_DEFAULT +#define ACPI_DEBUG_DEFAULT (ACPI_LV_INFO | ACPI_LV_REPAIR) + #ifndef CONFIG_ACPI
/* External globals for __KERNEL__, stubs is needed */ @@ -82,11 +87,6 @@ #define ACPI_NO_ERROR_MESSAGES #undef ACPI_DEBUG_OUTPUT
-/* Use a specific bugging default separate from ACPICA */ - -#undef ACPI_DEBUG_DEFAULT -#define ACPI_DEBUG_DEFAULT (ACPI_LV_INFO | ACPI_LV_REPAIR) - /* External interface for __KERNEL__, stub is needed */
#define ACPI_EXTERNAL_RETURN_STATUS(prototype) \
From: Steve Twiss stwiss.opensource@diasemi.com
commit 6b4814a9451add06d457e198be418bf6a3e6a990 upstream.
Mismatch between what is found in the Datasheets for DA9063 and DA9063L provided by Dialog Semiconductor, and the register names provided in the MFD registers file. The changes are for the OTP (one-time-programming) control registers. The two naming errors are OPT instead of OTP, and COUNT instead of CONT (i.e. control).
Cc: Stable stable@vger.kernel.org Signed-off-by: Steve Twiss stwiss.opensource@diasemi.com Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/mfd/da9063/registers.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/include/linux/mfd/da9063/registers.h +++ b/include/linux/mfd/da9063/registers.h @@ -215,9 +215,9 @@
/* DA9063 Configuration registers */ /* OTP */ -#define DA9063_REG_OPT_COUNT 0x101 -#define DA9063_REG_OPT_ADDR 0x102 -#define DA9063_REG_OPT_DATA 0x103 +#define DA9063_REG_OTP_CONT 0x101 +#define DA9063_REG_OTP_ADDR 0x102 +#define DA9063_REG_OTP_DATA 0x103
/* Customer Trim and Configuration */ #define DA9063_REG_T_OFFSET 0x104
From: Dmitry Osipenko digetx@gmail.com
commit ea611d1cc180fbb56982c83cd5142a2b34881f5c upstream.
The FPS_PERIOD_MAX_US definitions are swapped for MAX20024 and MAX77620, fix it.
Cc: stable stable@vger.kernel.org Signed-off-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/mfd/max77620.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/mfd/max77620.h +++ b/include/linux/mfd/max77620.h @@ -136,8 +136,8 @@ #define MAX77620_FPS_PERIOD_MIN_US 40 #define MAX20024_FPS_PERIOD_MIN_US 20
-#define MAX77620_FPS_PERIOD_MAX_US 2560 -#define MAX20024_FPS_PERIOD_MAX_US 5120 +#define MAX20024_FPS_PERIOD_MAX_US 2560 +#define MAX77620_FPS_PERIOD_MAX_US 5120
#define MAX77620_REG_FPS_GPIO1 0x54 #define MAX77620_REG_FPS_GPIO2 0x55
From: Alexander Sverdlin alexander.sverdlin@nokia.com
commit 2b75ebeea6f4937d4d05ec4982c471cef9a29b7f upstream.
It was observed that reads crossing 4K address boundary are failing.
This limitation is mentioned in Intel documents:
Intel(R) 9 Series Chipset Family Platform Controller Hub (PCH) Datasheet:
"5.26.3 Flash Access Program Register Access: * Program Register Accesses are not allowed to cross a 4 KB boundary..."
Enhanced Serial Peripheral Interface (eSPI) Interface Base Specification (for Client and Server Platforms):
"5.1.4 Address For other memory transactions, the address may start or end at any byte boundary. However, the address and payload length combination must not cross the naturally aligned address boundary of the corresponding Maximum Payload Size. It must not cross a 4 KB address boundary."
Avoid this by splitting an operation crossing the boundary into two operations.
Fixes: 8afda8b26d01 ("spi-nor: Add support for Intel SPI serial flash controller") Cc: stable@vger.kernel.org Reported-by: Romain Porte romain.porte@nokia.com Tested-by: Pascal Fabreges pascal.fabreges@nokia.com Signed-off-by: Alexander Sverdlin alexander.sverdlin@nokia.com Reviewed-by: Tudor Ambarus tudor.ambarus@microchip.com Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/spi-nor/intel-spi.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/mtd/spi-nor/intel-spi.c +++ b/drivers/mtd/spi-nor/intel-spi.c @@ -632,6 +632,10 @@ static ssize_t intel_spi_read(struct spi while (len > 0) { block_size = min_t(size_t, len, INTEL_SPI_FIFO_SZ);
+ /* Read cannot cross 4K boundary */ + block_size = min_t(loff_t, from + block_size, + round_up(from + 1, SZ_4K)) - from; + writel(from, ispi->base + FADDR);
val = readl(ispi->base + HSFSTS_CTL); @@ -685,6 +689,10 @@ static ssize_t intel_spi_write(struct sp while (len > 0) { block_size = min_t(size_t, len, INTEL_SPI_FIFO_SZ);
+ /* Write cannot cross 4K boundary */ + block_size = min_t(loff_t, to + block_size, + round_up(to + 1, SZ_4K)) - to; + writel(to, ispi->base + FADDR);
val = readl(ispi->base + HSFSTS_CTL);
From: Chris Packham chris.packham@alliedtelesis.co.nz
commit 64d14c6fe040361ff6aecb825e392cf97837cd9e upstream.
When the gpio-addr-flash.c driver was merged with physmap-core.c the code to store the current gpio_values was lost. This meant that once a gpio was asserted it was never de-asserted. Fix this by storing the current offset in gpio_values like the old driver used to.
Fixes: commit ba32ce95cbd9 ("mtd: maps: Merge gpio-addr-flash.c into physmap-core.c") Cc: stable@vger.kernel.org Signed-off-by: Chris Packham chris.packham@alliedtelesis.co.nz Reviewed-by: Boris Brezillon boris.brezillon@collabora.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/maps/physmap-core.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/mtd/maps/physmap-core.c +++ b/drivers/mtd/maps/physmap-core.c @@ -132,6 +132,8 @@ static void physmap_set_addr_gpios(struc
gpiod_set_value(info->gpios->desc[i], !!(BIT(i) & ofs)); } + + info->gpio_values = ofs; }
#define win_mask(order) (BIT(order) - 1)
From: Chris Packham chris.packham@alliedtelesis.co.nz
commit d41970097f10d898cef0eb04bf53d786efd6bbbc upstream.
When the physmap_of_core.c code was merged into physmap-core.c the ability to use MTD_PHYSMAP_OF with only MTD_RAM selected was lost. Restore this by adding MTD_RAM to the dependencies of MTD_PHYSMAP.
Fixes: commit 642b1e8dbed7 ("mtd: maps: Merge physmap_of.c into physmap-core.c") Cc: stable@vger.kernel.org Signed-off-by: Chris Packham chris.packham@alliedtelesis.co.nz Reviewed-by: Hamish Martin hamish.martin@alliedtelesis.co.nz Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/maps/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/mtd/maps/Kconfig +++ b/drivers/mtd/maps/Kconfig @@ -10,7 +10,7 @@ config MTD_COMPLEX_MAPPINGS
config MTD_PHYSMAP tristate "Flash device in physical memory map" - depends on MTD_CFI || MTD_JEDECPROBE || MTD_ROM || MTD_LPDDR + depends on MTD_CFI || MTD_JEDECPROBE || MTD_ROM || MTD_RAM || MTD_LPDDR help This provides a 'mapping' driver which allows the NOR Flash and ROM driver code to communicate with chips which are mapped
From: Yifeng Li tomli@tomli.me
commit 75ddbc1fb11efac87b611d48e9802f6fe2bb2163 upstream.
Previously, in the userspace, it was possible to use the "setterm" command from util-linux to blank the VT console by default, using the following command.
According to the man page,
The force option keeps the screen blank even if a key is pressed.
It was implemented by calling TIOCL_BLANKSCREEN.
case BLANKSCREEN: ioctlarg = TIOCL_BLANKSCREEN; if (ioctl(STDIN_FILENO, TIOCLINUX, &ioctlarg)) warn(_("cannot force blank")); break;
However, after Linux 4.12, this command ceased to work anymore, which is unexpected. By inspecting the kernel source, it shows that the issue was triggered by the side-effect from commit a4199f5eb809 ("tty: Disable default console blanking interval").
The console blanking is implemented by function do_blank_screen() in vt.c: "blank_state" will be initialized to "blank_normal_wait" in con_init() if AND ONLY IF ("blankinterval" > 0). If "blankinterval" is 0, "blank_state" will be "blank_off" (== 0), and a call to do_blank_screen() will always abort, even if a forced blanking is required from the user by calling TIOCL_BLANKSCREEN, the console won't be blanked.
This behavior is unexpected from a user's point-of-view, since it's not mentioned in any documentation. The setterm man page suggests it will always work, and the kernel comments in uapi/linux/tiocl.h says
/* keep screen blank even if a key is pressed */ #define TIOCL_BLANKSCREEN 14
To fix it, we simply remove the "blank_state != blank_off" check, as pointed out by Nicolas Pitre, this check doesn't logically make sense and it's safe to remove.
Suggested-by: Nicolas Pitre nicolas.pitre@linaro.org Fixes: a4199f5eb809 ("tty: Disable default console blanking interval") Signed-off-by: Yifeng Li tomli@tomli.me Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/vt/vt.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -4179,8 +4179,6 @@ void do_blank_screen(int entering_gfx) return; }
- if (blank_state != blank_normal_wait) - return; blank_state = blank_off;
/* don't blank graphics */
From: Sergei Trofimovich slyfox@gentoo.org
commit 46ca3f735f345c9d87383dd3a09fa5d43870770e upstream.
The bug manifests as an attempt to access deallocated memory:
BUG: unable to handle kernel paging request at ffff9c8735448000 #PF error: [PROT] [WRITE] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 80000007f5448161 Oops: 0003 [#1] PREEMPT SMP CPU: 6 PID: 388 Comm: loadkeys Tainted: G C 5.0.0-rc6-00153-g5ded5871030e #91 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F12 11/14/2013 RIP: 0010:__memmove+0x81/0x1a0 Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 <f3> 48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49 RSP: 0018:ffffa1b9002d7d08 EFLAGS: 00010203 RAX: ffff9c873541af43 RBX: ffff9c873541af43 RCX: 00000c6f105cd6bf RDX: 0000637882e986b6 RSI: ffff9c8735447ffb RDI: ffff9c8735447ffb RBP: ffff9c8739cd3800 R08: ffff9c873b802f00 R09: 00000000fffff73b R10: ffffffffb82b35f1 R11: 00505b1b004d5b1b R12: 0000000000000000 R13: ffff9c873541af3d R14: 000000000000000b R15: 000000000000000c FS: 00007f450c390580(0000) GS:ffff9c873f180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff9c8735448000 CR3: 00000007e213c002 CR4: 00000000000606e0 Call Trace: vt_do_kdgkb_ioctl+0x34d/0x440 vt_ioctl+0xba3/0x1190 ? __bpf_prog_run32+0x39/0x60 ? mem_cgroup_commit_charge+0x7b/0x4e0 tty_ioctl+0x23f/0x920 ? preempt_count_sub+0x98/0xe0 ? __seccomp_filter+0x67/0x600 do_vfs_ioctl+0xa2/0x6a0 ? syscall_trace_enter+0x192/0x2d0 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x54/0xe0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The bug manifests on systemd systems with multiple vtcon devices: # cat /sys/devices/virtual/vtconsole/vtcon0/name (S) dummy device # cat /sys/devices/virtual/vtconsole/vtcon1/name (M) frame buffer device
There systemd runs 'loadkeys' tool in tapallel for each vtcon instance. This causes two parallel ioctl(KDSKBSENT) calls to race into adding the same entry into 'func_table' array at:
drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl()
The function has no locking around writes to 'func_table'.
The simplest reproducer is to have initrams with the following init on a 8-CPU machine x86_64:
#!/bin/sh
loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & wait
The change adds lock on write path only. Reads are still racy.
CC: Greg Kroah-Hartman gregkh@linuxfoundation.org CC: Jiri Slaby jslaby@suse.com Link: https://lkml.org/lkml/2019/2/17/256 Signed-off-by: Sergei Trofimovich slyfox@gentoo.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/vt/keyboard.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-)
--- a/drivers/tty/vt/keyboard.c +++ b/drivers/tty/vt/keyboard.c @@ -123,6 +123,7 @@ static const int NR_TYPES = ARRAY_SIZE(m static struct input_handler kbd_handler; static DEFINE_SPINLOCK(kbd_event_lock); static DEFINE_SPINLOCK(led_lock); +static DEFINE_SPINLOCK(func_buf_lock); /* guard 'func_buf' and friends */ static unsigned long key_down[BITS_TO_LONGS(KEY_CNT)]; /* keyboard key bitmap */ static unsigned char shift_down[NR_SHIFT]; /* shift state counters.. */ static bool dead_key_next; @@ -1990,11 +1991,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb char *p; u_char *q; u_char __user *up; - int sz; + int sz, fnw_sz; int delta; char *first_free, *fj, *fnw; int i, j, k; int ret; + unsigned long flags;
if (!capable(CAP_SYS_TTY_CONFIG)) perm = 0; @@ -2037,7 +2039,14 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb goto reterr; }
+ fnw = NULL; + fnw_sz = 0; + /* race aginst other writers */ + again: + spin_lock_irqsave(&func_buf_lock, flags); q = func_table[i]; + + /* fj pointer to next entry after 'q' */ first_free = funcbufptr + (funcbufsize - funcbufleft); for (j = i+1; j < MAX_NR_FUNC && !func_table[j]; j++) ; @@ -2045,10 +2054,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb fj = func_table[j]; else fj = first_free; - + /* buffer usage increase by new entry */ delta = (q ? -strlen(q) : 1) + strlen(kbs->kb_string); + if (delta <= funcbufleft) { /* it fits in current buf */ if (j < MAX_NR_FUNC) { + /* make enough space for new entry at 'fj' */ memmove(fj + delta, fj, first_free - fj); for (k = j; k < MAX_NR_FUNC; k++) if (func_table[k]) @@ -2061,20 +2072,28 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb sz = 256; while (sz < funcbufsize - funcbufleft + delta) sz <<= 1; - fnw = kmalloc(sz, GFP_KERNEL); - if(!fnw) { - ret = -ENOMEM; - goto reterr; + if (fnw_sz != sz) { + spin_unlock_irqrestore(&func_buf_lock, flags); + kfree(fnw); + fnw = kmalloc(sz, GFP_KERNEL); + fnw_sz = sz; + if (!fnw) { + ret = -ENOMEM; + goto reterr; + } + goto again; }
if (!q) func_table[i] = fj; + /* copy data before insertion point to new location */ if (fj > funcbufptr) memmove(fnw, funcbufptr, fj - funcbufptr); for (k = 0; k < j; k++) if (func_table[k]) func_table[k] = fnw + (func_table[k] - funcbufptr);
+ /* copy data after insertion point to new location */ if (first_free > fj) { memmove(fnw + (fj - funcbufptr) + delta, fj, first_free - fj); for (k = j; k < MAX_NR_FUNC; k++) @@ -2087,7 +2106,9 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb funcbufleft = funcbufleft - delta + sz - funcbufsize; funcbufsize = sz; } + /* finally insert item itself */ strcpy(func_table[i], kbs->kb_string); + spin_unlock_irqrestore(&func_buf_lock, flags); break; } ret = 0;
From: Jiufei Xue jiufei.xue@linux.alibaba.com
commit 742b06b5628f2cd23cb51a034cb54dc33c6162c5 upstream.
We hit a BUG at fs/buffer.c:3057 if we detached the nbd device before unmounting ext4 filesystem.
The typical chain of events leading to the BUG: jbd2_write_superblock submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh));
The block device is removed and all the pages are invalidated. JBD2 was trying to write journal superblock to the block device which is no longer present.
Fix this by checking the journal superblock's buffer head prior to submitting.
Reported-by: Eric Ren renzhen@linux.alibaba.com Signed-off-by: Jiufei Xue jiufei.xue@linux.alibaba.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jbd2/journal.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1350,6 +1350,10 @@ static int jbd2_write_superblock(journal journal_superblock_t *sb = journal->j_superblock; int ret;
+ /* Buffer got discarded which means block device got invalidated */ + if (!buffer_mapped(bh)) + return -EIO; + trace_jbd2_write_superblock(journal, write_flags); if (!(journal->j_flags & JBD2_BARRIER)) write_flags &= ~(REQ_FUA | REQ_PREFLUSH);
From: Jan Kara jack@suse.cz
commit 31562b954b60f02acb91b7349dc6432d3f8c3c5f upstream.
The sanity check in mb_find_extent() only checked that returned extent does not extend past blocksize * 8, however it should not extend past EXT4_CLUSTERS_PER_GROUP(sb). This can happen when clusters_per_group < blocksize * 8 and the tail of the bitmap is not properly filled by 1s which happened e.g. when ancient kernels have grown the filesystem.
Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/mballoc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1539,7 +1539,7 @@ static int mb_find_extent(struct ext4_bu ex->fe_len += 1 << order; }
- if (ex->fe_start + ex->fe_len > (1 << (e4b->bd_blkbits + 3))) { + if (ex->fe_start + ex->fe_len > EXT4_CLUSTERS_PER_GROUP(e4b->bd_sb)) { /* Should never happen! (but apparently sometimes does?!?) */ WARN_ON(1); ext4_error(e4b->bd_sb, "corruption or bug in mb_find_extent "
From: Theodore Ts'o tytso@mit.edu
commit 345c0dbf3a30872d9b204db96b5857cd00808cae upstream.
Add the blocks which belong to the journal inode to block_validity's system zone so attempts to deallocate or overwrite the journal due a corrupted file system where the journal blocks are also claimed by another inode.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879 Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/block_validity.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++ fs/ext4/inode.c | 4 +++ 2 files changed, 52 insertions(+)
--- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -137,6 +137,48 @@ static void debug_print_tree(struct ext4 printk(KERN_CONT "\n"); }
+static int ext4_protect_reserved_inode(struct super_block *sb, u32 ino) +{ + struct inode *inode; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_map_blocks map; + u32 i = 0, err = 0, num, n; + + if ((ino < EXT4_ROOT_INO) || + (ino > le32_to_cpu(sbi->s_es->s_inodes_count))) + return -EINVAL; + inode = ext4_iget(sb, ino, EXT4_IGET_SPECIAL); + if (IS_ERR(inode)) + return PTR_ERR(inode); + num = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits; + while (i < num) { + map.m_lblk = i; + map.m_len = num - i; + n = ext4_map_blocks(NULL, inode, &map, 0); + if (n < 0) { + err = n; + break; + } + if (n == 0) { + i++; + } else { + if (!ext4_data_block_valid(sbi, map.m_pblk, n)) { + ext4_error(sb, "blocks %llu-%llu from inode %u " + "overlap system zone", map.m_pblk, + map.m_pblk + map.m_len - 1, ino); + err = -EFSCORRUPTED; + break; + } + err = add_system_zone(sbi, map.m_pblk, n); + if (err < 0) + break; + i += n; + } + } + iput(inode); + return err; +} + int ext4_setup_system_zone(struct super_block *sb) { ext4_group_t ngroups = ext4_get_groups_count(sb); @@ -171,6 +213,12 @@ int ext4_setup_system_zone(struct super_ if (ret) return ret; } + if (ext4_has_feature_journal(sb) && sbi->s_es->s_journal_inum) { + ret = ext4_protect_reserved_inode(sb, + le32_to_cpu(sbi->s_es->s_journal_inum)); + if (ret) + return ret; + }
if (test_opt(sb, DEBUG)) debug_print_tree(sbi); --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -399,6 +399,10 @@ static int __check_block_validity(struct unsigned int line, struct ext4_map_blocks *map) { + if (ext4_has_feature_journal(inode->i_sb) && + (inode->i_ino == + le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum))) + return 0; if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), map->m_pblk, map->m_len)) { ext4_error_inode(inode, func, line, map->m_pblk,
From: Theodore Ts'o tytso@mit.edu
commit e5d01196c0428a206f307e9ee5f6842964098ff0 upstream.
In other places in fs/ext4/xattr.c, if e_value_inum is non-zero, the code ignores the value in e_value_offs. The e_value_offs *should* be zero, but we shouldn't depend upon it, since it might not be true in a corrupted/fuzzed file system.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202897 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202877 Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/xattr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1696,7 +1696,7 @@ static int ext4_xattr_set_entry(struct e
/* No failures allowed past this point. */
- if (!s->not_found && here->e_value_size && here->e_value_offs) { + if (!s->not_found && here->e_value_size && !here->e_value_inum) { /* Remove the old value. */ void *first_val = s->base + min_offs; size_t offs = le16_to_cpu(here->e_value_offs);
From: Pan Bian bianpan2016@163.com
commit 8c380ab4b7b59c0c602743810be1b712514eaebc upstream.
The reference to iloc.bh has been dropped in ext4_mark_iloc_dirty. However, the reference is dropped again if error occurs during ext4_handle_dirty_metadata, which may result in use-after-free bugs.
Fixes: fb265c9cb49e("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Signed-off-by: Pan Bian bianpan2016@163.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/resize.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -874,6 +874,7 @@ static int add_new_gdb(handle_t *handle, err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh); if (unlikely(err)) { ext4_std_error(sb, err); + iloc.bh = NULL; goto errout; } brelse(dind);
From: Barret Rhoden brho@google.com
commit 7bc04c5c2cc467c5b40f2b03ba08da174a0d5fa7 upstream.
When remounting with debug_want_extra_isize, we were not performing the same checks that we do during a normal mount. That allowed us to set a value for s_want_extra_isize that reached outside the s_inode_size.
Fixes: e2b911c53584 ("ext4: clean up feature test macros with predicate functions") Reported-by: syzbot+f584efa0ac7213c226b7@syzkaller.appspotmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Barret Rhoden brho@google.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 58 ++++++++++++++++++++++++++++++++------------------------ 1 file changed, 34 insertions(+), 24 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3513,6 +3513,37 @@ int ext4_calculate_overhead(struct super return 0; }
+static void ext4_clamp_want_extra_isize(struct super_block *sb) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_super_block *es = sbi->s_es; + + /* determine the minimum size of new large inodes, if present */ + if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE && + sbi->s_want_extra_isize == 0) { + sbi->s_want_extra_isize = sizeof(struct ext4_inode) - + EXT4_GOOD_OLD_INODE_SIZE; + if (ext4_has_feature_extra_isize(sb)) { + if (sbi->s_want_extra_isize < + le16_to_cpu(es->s_want_extra_isize)) + sbi->s_want_extra_isize = + le16_to_cpu(es->s_want_extra_isize); + if (sbi->s_want_extra_isize < + le16_to_cpu(es->s_min_extra_isize)) + sbi->s_want_extra_isize = + le16_to_cpu(es->s_min_extra_isize); + } + } + /* Check if enough inode space is available */ + if (EXT4_GOOD_OLD_INODE_SIZE + sbi->s_want_extra_isize > + sbi->s_inode_size) { + sbi->s_want_extra_isize = sizeof(struct ext4_inode) - + EXT4_GOOD_OLD_INODE_SIZE; + ext4_msg(sb, KERN_INFO, + "required extra inode space not available"); + } +} + static void ext4_set_resv_clusters(struct super_block *sb) { ext4_fsblk_t resv_clusters; @@ -4387,30 +4418,7 @@ no_journal: } else if (ret) goto failed_mount4a;
- /* determine the minimum size of new large inodes, if present */ - if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE && - sbi->s_want_extra_isize == 0) { - sbi->s_want_extra_isize = sizeof(struct ext4_inode) - - EXT4_GOOD_OLD_INODE_SIZE; - if (ext4_has_feature_extra_isize(sb)) { - if (sbi->s_want_extra_isize < - le16_to_cpu(es->s_want_extra_isize)) - sbi->s_want_extra_isize = - le16_to_cpu(es->s_want_extra_isize); - if (sbi->s_want_extra_isize < - le16_to_cpu(es->s_min_extra_isize)) - sbi->s_want_extra_isize = - le16_to_cpu(es->s_min_extra_isize); - } - } - /* Check if enough inode space is available */ - if (EXT4_GOOD_OLD_INODE_SIZE + sbi->s_want_extra_isize > - sbi->s_inode_size) { - sbi->s_want_extra_isize = sizeof(struct ext4_inode) - - EXT4_GOOD_OLD_INODE_SIZE; - ext4_msg(sb, KERN_INFO, "required extra inode space not" - "available"); - } + ext4_clamp_want_extra_isize(sb);
ext4_set_resv_clusters(sb);
@@ -5194,6 +5202,8 @@ static int ext4_remount(struct super_blo goto restore_opts; }
+ ext4_clamp_want_extra_isize(sb); + if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^ test_opt(sb, JOURNAL_CHECKSUM)) { ext4_msg(sb, KERN_ERR, "changing journal_checksum "
From: Kirill Tkhai ktkhai@virtuozzo.com
commit 310a997fd74de778b9a4848a64be9cda9f18764a upstream.
It is never possible, that number of block groups decreases, since only online grow is supported.
But after a growing occured, we have to zero inode tables for just created new block groups.
Fixes: 19c5246d2516 ("ext4: add new online resize interface") Signed-off-by: Kirill Tkhai ktkhai@virtuozzo.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -978,7 +978,7 @@ mext_out: if (err == 0) err = err2; mnt_drop_write_file(filp); - if (!err && (o_group > EXT4_SB(sb)->s_groups_count) && + if (!err && (o_group < EXT4_SB(sb)->s_groups_count) && ext4_has_group_desc_csum(sb) && test_opt(sb, INIT_INODE_TABLE)) err = ext4_register_li_request(sb, o_group);
From: Debabrata Banerjee dbanerje@akamai.com
commit 50b29d8f033a7c88c5bc011abc2068b1691ab755 upstream.
Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as I assume was intended, all other options were blown away leading to _ext4_show_options() output being incorrect.
Fixes: 1e381f60dad9 ("ext4: do not allow journal_opts for fs w/o journal") Signed-off-by: Debabrata Banerjee dbanerje@akamai.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4269,7 +4269,7 @@ static int ext4_fill_super(struct super_ "data=, fs mounted w/o journal"); goto failed_mount_wq; } - sbi->s_def_mount_opt &= EXT4_MOUNT_JOURNAL_CHECKSUM; + sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM; clear_opt(sb, JOURNAL_CHECKSUM); clear_opt(sb, DATA_FLAGS); sbi->s_journal = NULL;
From: Qu Wenruo wqu@suse.com
commit 448de471cd4cab0cedd15770082567a69a784a11 upstream.
[BUG] When reading a file from a fuzzed image, kernel can panic like:
BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1 assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3500! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs] Call Trace: btrfs_lookup_csum+0x52/0x150 [btrfs] __btrfs_lookup_bio_sums+0x209/0x640 [btrfs] btrfs_submit_bio_hook+0x103/0x170 [btrfs] submit_one_bio+0x59/0x80 [btrfs] extent_read_full_page+0x58/0x80 [btrfs] generic_file_read_iter+0x2f6/0x9d0 __vfs_read+0x14d/0x1a0 vfs_read+0x8d/0x140 ksys_read+0x52/0xc0 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe
[CAUSE] The fuzzed image has a corrupted leaf whose first key doesn't match its parent:
checksum tree key (CSUM_TREE ROOT_ITEM 0) node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6 chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c ... key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE leaf 29761536 flags 0x1(WRITTEN) backref revision 1 fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6 chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244 range start 8798638964736 end 8798641262592 length 2297856
When reading the above tree block, we have extent_buffer->refs = 2 in the context:
- initial one from __alloc_extent_buffer() alloc_extent_buffer() |- __alloc_extent_buffer() |- atomic_set(&eb->refs, 1)
- one being added to fs_info->buffer_radix alloc_extent_buffer() |- check_buffer_tree_ref() |- atomic_inc(&eb->refs)
So if even we call free_extent_buffer() in read_tree_block or other similar situation, we only decrease the refs by 1, it doesn't reach 0 and won't be freed right now.
The staled eb and its corrupted content will still be kept cached.
Furthermore, we have several extra cases where we either don't do first key check or the check is not proper for all callers:
- scrub We just don't have first key in this context.
- shared tree block One tree block can be shared by several snapshot/subvolume trees. In that case, the first key check for one subvolume doesn't apply to another.
So for the above reasons, a corrupted extent buffer can sneak into the buffer cache.
[FIX] Call verify_level_key in read_block_for_search to do another verification. For that purpose the function is exported.
Due to above reasons, although we can free corrupted extent buffer from cache, we still need the check in read_block_for_search(), for scrub and shared tree blocks.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769 Reported-by: Yoon Jungyeon jungyeon@gatech.edu CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ctree.c | 10 ++++++++++ fs/btrfs/disk-io.c | 10 +++++----- fs/btrfs/disk-io.h | 3 +++ 3 files changed, 18 insertions(+), 5 deletions(-)
--- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2416,6 +2416,16 @@ read_block_for_search(struct btrfs_root if (tmp) { /* first we do an atomic uptodate check */ if (btrfs_buffer_uptodate(tmp, gen, 1) > 0) { + /* + * Do extra check for first_key, eb can be stale due to + * being cached, read from scrub, or have multiple + * parents (shared tree blocks). + */ + if (btrfs_verify_level_key(fs_info, tmp, + parent_level - 1, &first_key, gen)) { + free_extent_buffer(tmp); + return -EUCLEAN; + } *eb_ret = tmp; return 0; } --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -414,9 +414,9 @@ static int btrfs_check_super_csum(struct return ret; }
-static int verify_level_key(struct btrfs_fs_info *fs_info, - struct extent_buffer *eb, int level, - struct btrfs_key *first_key, u64 parent_transid) +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, int level, + struct btrfs_key *first_key, u64 parent_transid) { int found_level; struct btrfs_key found_key; @@ -493,8 +493,8 @@ static int btree_read_extent_buffer_page if (verify_parent_transid(io_tree, eb, parent_transid, 0)) ret = -EIO; - else if (verify_level_key(fs_info, eb, level, - first_key, parent_transid)) + else if (btrfs_verify_level_key(fs_info, eb, level, + first_key, parent_transid)) ret = -EUCLEAN; else break; --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mi struct btrfs_device; struct btrfs_fs_devices;
+int btrfs_verify_level_key(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, int level, + struct btrfs_key *first_key, u64 parent_transid); struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, u64 parent_transid, int level, struct btrfs_key *first_key);
From: Nikolay Borisov nborisov@suse.com
commit 537f38f019fa0b762dbb4c0fc95d7fcce9db8e2d upstream.
If a an eb fails to be read for whatever reason - it's corrupted on disk and parent transid/key validations fail or IO for eb pages fail then this buffer must be removed from the buffer cache. Currently the code calls free_extent_buffer if an error occurs. Unfortunately this doesn't achieve the desired behavior since btrfs_find_create_tree_block returns with eb->refs == 2.
On the other hand free_extent_buffer will only decrement the refs once leaving it added to the buffer cache radix tree. This enables later code to look up the buffer from the cache and utilize it potentially leading to a crash.
The correct way to free the buffer is call free_extent_buffer_stale. This function will correctly call atomic_dec explicitly for the buffer and subsequently call release_extent_buffer which will decrement the final reference thus correctly remove the invalid buffer from buffer cache. This change affects only newly allocated buffers since they have eb->refs == 2.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755 Reported-by: Jungyeon jungyeon@gatech.edu CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/disk-io.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1018,13 +1018,18 @@ void readahead_tree_block(struct btrfs_f { struct extent_buffer *buf = NULL; struct inode *btree_inode = fs_info->btree_inode; + int ret;
buf = btrfs_find_create_tree_block(fs_info, bytenr); if (IS_ERR(buf)) return; - read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree, - buf, WAIT_NONE, 0); - free_extent_buffer(buf); + + ret = read_extent_buffer_pages(&BTRFS_I(btree_inode)->io_tree, buf, + WAIT_NONE, 0); + if (ret < 0) + free_extent_buffer_stale(buf); + else + free_extent_buffer(buf); }
int reada_tree_block_flagged(struct btrfs_fs_info *fs_info, u64 bytenr, @@ -1044,12 +1049,12 @@ int reada_tree_block_flagged(struct btrf ret = read_extent_buffer_pages(io_tree, buf, WAIT_PAGE_LOCK, mirror_num); if (ret) { - free_extent_buffer(buf); + free_extent_buffer_stale(buf); return ret; }
if (test_bit(EXTENT_BUFFER_CORRUPT, &buf->bflags)) { - free_extent_buffer(buf); + free_extent_buffer_stale(buf); return -EIO; } else if (extent_buffer_uptodate(buf)) { *eb = buf; @@ -1103,7 +1108,7 @@ struct extent_buffer *read_tree_block(st ret = btree_read_extent_buffer_pages(fs_info, buf, parent_transid, level, first_key); if (ret) { - free_extent_buffer(buf); + free_extent_buffer_stale(buf); return ERR_PTR(ret); } return buf;
From: Nikolay Borisov nborisov@suse.com
commit c2d1b3aae33605a61cbab445d8ae1c708ccd2698 upstream.
Up until now trimming the freespace was done irrespective of what the arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments will be entirely ignored. Fix it by correctly handling those paramter. This requires breaking if the found freespace extent is after the end of the passed range as well as completing trim after trimming fstrim_range::len bytes.
Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11315,9 +11315,9 @@ int btrfs_error_unpin_extent_range(struc * held back allocations. */ static int btrfs_trim_free_extents(struct btrfs_device *device, - u64 minlen, u64 *trimmed) + struct fstrim_range *range, u64 *trimmed) { - u64 start = 0, len = 0; + u64 start = range->start, len = 0; int ret;
*trimmed = 0; @@ -11360,8 +11360,8 @@ static int btrfs_trim_free_extents(struc if (!trans) up_read(&fs_info->commit_root_sem);
- ret = find_free_dev_extent_start(trans, device, minlen, start, - &start, &len); + ret = find_free_dev_extent_start(trans, device, range->minlen, + start, &start, &len); if (trans) { up_read(&fs_info->commit_root_sem); btrfs_put_transaction(trans); @@ -11374,6 +11374,16 @@ static int btrfs_trim_free_extents(struc break; }
+ /* If we are out of the passed range break */ + if (start > range->start + range->len - 1) { + mutex_unlock(&fs_info->chunk_mutex); + ret = 0; + break; + } + + start = max(range->start, start); + len = min(range->len, len); + ret = btrfs_issue_discard(device->bdev, start, len, &bytes); mutex_unlock(&fs_info->chunk_mutex);
@@ -11383,6 +11393,10 @@ static int btrfs_trim_free_extents(struc start += len; *trimmed += bytes;
+ /* We've trimmed enough */ + if (*trimmed >= range->len) + break; + if (fatal_signal_pending(current)) { ret = -ERESTARTSYS; break; @@ -11466,8 +11480,7 @@ int btrfs_trim_fs(struct btrfs_fs_info * mutex_lock(&fs_info->fs_devices->device_list_mutex); devices = &fs_info->fs_devices->devices; list_for_each_entry(device, devices, dev_list) { - ret = btrfs_trim_free_extents(device, range->minlen, - &group_trimmed); + ret = btrfs_trim_free_extents(device, range, &group_trimmed); if (ret) { dev_failed++; dev_ret = ret;
From: Filipe Manana fdmanana@suse.com
commit 9f89d5de8631c7930898a601b6612e271aa2261c upstream.
When we set a subvolume to read-only mode we do not flush dellaloc for any of its inodes (except if the filesystem is mounted with -o flushoncommit), since it does not affect correctness for any subsequent operations - except for a future send operation. The send operation will not be able to see the delalloc data since the respective file extent items, inode item updates, backreferences, etc, have not hit yet the subvolume and extent trees.
Effectively this means data loss, since the send stream will not contain any data from existing delalloc. Another problem from this is that if the writeback starts and finishes while the send operation is in progress, we have the subvolume tree being being modified concurrently which can result in send failing unexpectedly with EIO or hitting runtime errors, assertion failures or hitting BUG_ONs, etc.
Simple reproducer:
$ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt
$ btrfs subvolume create /mnt/sv $ xfs_io -f -c "pwrite -S 0xea 0 108K" /mnt/sv/foo
$ btrfs property set /mnt/sv ro true $ btrfs send -f /tmp/send.stream /mnt/sv
$ od -t x1 -A d /mnt/sv/foo 0000000 ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea * 0110592
$ umount /mnt $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt
$ btrfs receive -f /tmp/send.stream /mnt $ echo $? 0 $ od -t x1 -A d /mnt/sv/foo 0000000 # ---> empty file
Since this a problem that affects send only, fix it in send by flushing dellaloc for all the roots used by the send operation before send starts to process the commit roots.
This is a problem that affects send since it was introduced (commit 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")) but backporting it to older kernels has some dependencies:
- For kernels between 3.19 and 4.20, it depends on commit 3cd24c698004d2 ("btrfs: use tagged writepage to mitigate livelock of snapshot") because the function btrfs_start_delalloc_snapshot() does not exist before that commit. So one has to either pick that commit or replace the calls to btrfs_start_delalloc_snapshot() in this patch with calls to btrfs_start_delalloc_inodes().
- For kernels older than 3.19 it also requires commit e5fa8f865b3324 ("Btrfs: ensure send always works on roots without orphans") because it depends on the function ensure_commit_roots_uptodate() which that commits introduced.
- No dependencies for 5.0+ kernels.
A test case for fstests follows soon.
CC: stable@vger.kernel.org # 3.19+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/send.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+)
--- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -6579,6 +6579,38 @@ commit_trans: return btrfs_commit_transaction(trans); }
+/* + * Make sure any existing dellaloc is flushed for any root used by a send + * operation so that we do not miss any data and we do not race with writeback + * finishing and changing a tree while send is using the tree. This could + * happen if a subvolume is in RW mode, has delalloc, is turned to RO mode and + * a send operation then uses the subvolume. + * After flushing delalloc ensure_commit_roots_uptodate() must be called. + */ +static int flush_delalloc_roots(struct send_ctx *sctx) +{ + struct btrfs_root *root = sctx->parent_root; + int ret; + int i; + + if (root) { + ret = btrfs_start_delalloc_snapshot(root); + if (ret) + return ret; + btrfs_wait_ordered_extents(root, U64_MAX, 0, U64_MAX); + } + + for (i = 0; i < sctx->clone_roots_cnt; i++) { + root = sctx->clone_roots[i].root; + ret = btrfs_start_delalloc_snapshot(root); + if (ret) + return ret; + btrfs_wait_ordered_extents(root, U64_MAX, 0, U64_MAX); + } + + return 0; +} + static void btrfs_root_dec_send_in_progress(struct btrfs_root* root) { spin_lock(&root->root_item_lock); @@ -6803,6 +6835,10 @@ long btrfs_ioctl_send(struct file *mnt_f NULL); sort_clone_roots = 1;
+ ret = flush_delalloc_roots(sctx); + if (ret) + goto out; + ret = ensure_commit_roots_uptodate(sctx); if (ret) goto out;
From: Filipe Manana fdmanana@suse.com
commit 03628cdbc64db6262e50d0357960a4e9562676a1 upstream.
During fiemap, for regular extents (non inline) we need to check if they are shared and if they are, set the shared bit. Checking if an extent is shared requires checking the delayed references of the currently running transaction, since some reference might have not yet hit the extent tree and be only in the in-memory delayed references.
However we were using a transaction join for this, which creates a new transaction when there is no transaction currently running. That means that two more potential failures can happen: creating the transaction and committing it. Further, if no write activity is currently happening in the system, and fiemap calls keep being done, we end up creating and committing transactions that do nothing.
In some extreme cases this can result in the commit of the transaction created by fiemap to fail with ENOSPC when updating the root item of a subvolume tree because a join does not reserve any space, leading to a trace like the following:
heisenberg kernel: ------------[ cut here ]------------ heisenberg kernel: BTRFS: Transaction aborted (error -28) heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068 heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000 heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 heisenberg kernel: Call Trace: heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs] heisenberg kernel: ? _cond_resched+0x15/0x30 heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs] heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs] heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs] heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs] heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] heisenberg kernel: kthread+0x112/0x130 heisenberg kernel: ? kthread_bind+0x30/0x30 heisenberg kernel: ret_from_fork+0x35/0x40 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
Since fiemap (and btrfs_check_shared()) is a read-only operation, do not do a transaction join to avoid the overhead of creating a new transaction (if there is currently no running transaction) and introducing a potential point of failure when the new transaction gets committed, instead use a transaction attach to grab a handle for the currently running transaction if any.
Reported-by: Christoph Anton Mitterer calestyo@scientia.net Link: https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4... Fixes: afce772e87c36c ("btrfs: fix check_shared for fiemap ioctl") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/backref.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
--- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1460,8 +1460,8 @@ int btrfs_find_all_roots(struct btrfs_tr * callers (such as fiemap) which want to know whether the extent is * shared but do not need a ref count. * - * This attempts to allocate a transaction in order to account for - * delayed refs, but continues on even when the alloc fails. + * This attempts to attach to the running transaction in order to account for + * delayed refs, but continues on even when no running transaction exists. * * Return: 0 if extent is not shared, 1 if it is shared, < 0 on error. */ @@ -1484,13 +1484,16 @@ int btrfs_check_shared(struct btrfs_root tmp = ulist_alloc(GFP_NOFS); roots = ulist_alloc(GFP_NOFS); if (!tmp || !roots) { - ulist_free(tmp); - ulist_free(roots); - return -ENOMEM; + ret = -ENOMEM; + goto out; }
- trans = btrfs_join_transaction(root); + trans = btrfs_attach_transaction(root); if (IS_ERR(trans)) { + if (PTR_ERR(trans) != -ENOENT && PTR_ERR(trans) != -EROFS) { + ret = PTR_ERR(trans); + goto out; + } trans = NULL; down_read(&fs_info->commit_root_sem); } else { @@ -1523,6 +1526,7 @@ int btrfs_check_shared(struct btrfs_root } else { up_read(&fs_info->commit_root_sem); } +out: ulist_free(tmp); ulist_free(roots); return ret;
From: Filipe Manana fdmanana@suse.com
commit bfc61c36260ca990937539cd648ede3cd749bc10 upstream.
When finding out which inodes have references on a particular extent, done by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both v1 and v2) ioctl and from scrub we use the transaction join API to grab a reference on the currently running transaction, since in order to give accurate results we need to inspect the delayed references of the currently running transaction.
However, if there is currently no running transaction, the join operation will create a new transaction. This is inefficient as the transaction will eventually be committed, doing unnecessary IO and introducing a potential point of failure that will lead to a transaction abort due to -ENOSPC, as recently reported [1].
That's because the join, creates the transaction but does not reserve any space, so when attempting to update the root item of the root passed to btrfs_join_transaction(), during the transaction commit, we can end up failling with -ENOSPC. Users of a join operation are supposed to actually do some filesystem changes and reserve space by some means, which is not the case of iterate_extent_inodes(), it is a read-only operation for all contextes from which it is called.
The reported [1] -ENOSPC failure stack trace is the following:
heisenberg kernel: ------------[ cut here ]------------ heisenberg kernel: BTRFS: Transaction aborted (error -28) heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068 heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000 heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 heisenberg kernel: Call Trace: heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs] heisenberg kernel: ? _cond_resched+0x15/0x30 heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs] heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs] heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs] heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs] heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] heisenberg kernel: kthread+0x112/0x130 heisenberg kernel: ? kthread_bind+0x30/0x30 heisenberg kernel: ret_from_fork+0x35/0x40 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
So fix that by using the attach API, which does not create a transaction when there is currently no running transaction.
[1] https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4...
Reported-by: Zygo Blaxell ce3g8jdj@umail.furryterror.org CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/backref.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1916,13 +1916,19 @@ int iterate_extent_inodes(struct btrfs_f extent_item_objectid);
if (!search_commit_root) { - trans = btrfs_join_transaction(fs_info->extent_root); - if (IS_ERR(trans)) - return PTR_ERR(trans); + trans = btrfs_attach_transaction(fs_info->extent_root); + if (IS_ERR(trans)) { + if (PTR_ERR(trans) != -ENOENT && + PTR_ERR(trans) != -EROFS) + return PTR_ERR(trans); + trans = NULL; + } + } + + if (trans) btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem); - } else { + else down_read(&fs_info->commit_root_sem); - }
ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid, tree_mod_seq_elem.seq, &refs, @@ -1955,7 +1961,7 @@ int iterate_extent_inodes(struct btrfs_f
free_leaf_list(refs); out: - if (!search_commit_root) { + if (trans) { btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem); btrfs_end_transaction(trans); } else {
From: Filipe Manana fdmanana@suse.com
commit 62d54f3a7fa27ef6a74d6cdf643ce04beba3afa7 upstream.
Send operates on read only trees and expects them to never change while it is using them. This is part of its initial design, and this expection is due to two different reasons:
1) When it was introduced, no operations were allowed to modifiy read-only subvolumes/snapshots (including defrag for example).
2) It keeps send from having an impact on other filesystem operations. Namely send does not need to keep locks on the trees nor needs to hold on to transaction handles and delay transaction commits. This ends up being a consequence of the former reason.
However the deduplication feature was introduced later (on September 2013, while send was introduced in July 2012) and it allowed for deduplication with destination files that belong to read-only trees (subvolumes and snapshots).
That means that having a send operation (either full or incremental) running in parallel with a deduplication that has the destination inode in one of the trees used by the send operation, can result in tree nodes and leaves getting freed and reused while send is using them. This problem is similar to the problem solved for the root nodes getting freed and reused when a snapshot is made against one tree that is currenly being used by a send operation, fixed in commits [1] and [2]. These commits explain in detail how the problem happens and the explanation is valid for any node or leaf that is not the root of a tree as well. This problem was also discussed and explained recently in a thread [3].
The problem is very easy to reproduce when using send with large trees (snapshots) and just a few concurrent deduplication operations that target files in the trees used by send. A stress test case is being sent for fstests that triggers the issue easily. The most common error to hit is the send ioctl return -EIO with the following messages in dmesg/syslog:
[1631617.204075] BTRFS error (device sdc): did not find backref in send_root. inode=63292, offset=0, disk_byte=5228134400 found extent=5228134400 [1631633.251754] BTRFS error (device sdc): parent transid verify failed on 32243712 wanted 24 found 27
The first one is very easy to hit while the second one happens much less frequently, except for very large trees (in that test case, snapshots with 100000 files having large xattrs to get deep and wide trees). Less frequently, at least one BUG_ON can be hit:
[1631742.130080] ------------[ cut here ]------------ [1631742.130625] kernel BUG at fs/btrfs/ctree.c:1806! [1631742.131188] invalid opcode: 0000 [#6] SMP DEBUG_PAGEALLOC PTI [1631742.131726] CPU: 1 PID: 13394 Comm: btrfs Tainted: G B D W 5.0.0-rc8-btrfs-next-45 #1 [1631742.132265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [1631742.133399] RIP: 0010:read_node_slot+0x122/0x130 [btrfs] (...) [1631742.135061] RSP: 0018:ffffb530021ebaa0 EFLAGS: 00010246 [1631742.135615] RAX: ffff93ac8912e000 RBX: 000000000000009d RCX: 0000000000000002 [1631742.136173] RDX: 000000000000009d RSI: ffff93ac564b0d08 RDI: ffff93ad5b48c000 [1631742.136759] RBP: ffffb530021ebb7d R08: 0000000000000001 R09: ffffb530021ebb7d [1631742.137324] R10: ffffb530021eba70 R11: 0000000000000000 R12: ffff93ac87d0a708 [1631742.137900] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 [1631742.138455] FS: 00007f4cdb1528c0(0000) GS:ffff93ad76a80000(0000) knlGS:0000000000000000 [1631742.139010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1631742.139568] CR2: 00007f5acb3d0420 CR3: 000000012be3e006 CR4: 00000000003606e0 [1631742.140131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1631742.140719] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1631742.141272] Call Trace: [1631742.141826] ? do_raw_spin_unlock+0x49/0xc0 [1631742.142390] tree_advance+0x173/0x1d0 [btrfs] [1631742.142948] btrfs_compare_trees+0x268/0x690 [btrfs] [1631742.143533] ? process_extent+0x1070/0x1070 [btrfs] [1631742.144088] btrfs_ioctl_send+0x1037/0x1270 [btrfs] [1631742.144645] _btrfs_ioctl_send+0x80/0x110 [btrfs] [1631742.145161] ? trace_sched_stick_numa+0xe0/0xe0 [1631742.145685] btrfs_ioctl+0x13fe/0x3120 [btrfs] [1631742.146179] ? account_entity_enqueue+0xd3/0x100 [1631742.146662] ? reweight_entity+0x154/0x1a0 [1631742.147135] ? update_curr+0x20/0x2a0 [1631742.147593] ? check_preempt_wakeup+0x103/0x250 [1631742.148053] ? do_vfs_ioctl+0xa2/0x6f0 [1631742.148510] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [1631742.148942] do_vfs_ioctl+0xa2/0x6f0 [1631742.149361] ? __fget+0x113/0x200 [1631742.149767] ksys_ioctl+0x70/0x80 [1631742.150159] __x64_sys_ioctl+0x16/0x20 [1631742.150543] do_syscall_64+0x60/0x1b0 [1631742.150931] entry_SYSCALL_64_after_hwframe+0x49/0xbe [1631742.151326] RIP: 0033:0x7f4cd9f5add7 (...) [1631742.152509] RSP: 002b:00007ffe91017708 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [1631742.152892] RAX: ffffffffffffffda RBX: 0000000000000105 RCX: 00007f4cd9f5add7 [1631742.153268] RDX: 00007ffe91017790 RSI: 0000000040489426 RDI: 0000000000000007 [1631742.153633] RBP: 0000000000000007 R08: 00007f4cd9e79700 R09: 00007f4cd9e79700 [1631742.153999] R10: 00007f4cd9e799d0 R11: 0000000000000202 R12: 0000000000000003 [1631742.154365] R13: 0000555dfae53020 R14: 0000000000000000 R15: 0000000000000001 (...) [1631742.156696] ---[ end trace 5dac9f96dcc3fd6b ]---
That BUG_ON happens because while send is using a node, that node is COWed by a concurrent deduplication, gets freed and gets reused as a leaf (because a transaction commit happened in between), so when it attempts to read a slot from the extent buffer, at ctree.c:read_node_slot(), the extent buffer contents were wiped out and it now matches a leaf (which can even belong to some other tree now), hitting the BUG_ON(level == 0).
Fix this concurrency issue by not allowing send and deduplication to run in parallel if both operate on the same readonly trees, returning EAGAIN to user space and logging an exlicit warning in dmesg/syslog.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [3] https://lore.kernel.org/linux-btrfs/CAL3q7H7iqSEEyFaEtpRZw3cp613y+4k2Q8b4W7m...
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ctree.h | 6 ++++++ fs/btrfs/ioctl.c | 19 ++++++++++++++++++- fs/btrfs/send.c | 26 ++++++++++++++++++++++++++ 3 files changed, 50 insertions(+), 1 deletion(-)
--- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1348,6 +1348,12 @@ struct btrfs_root { * manipulation with the read-only status via SUBVOL_SETFLAGS */ int send_in_progress; + /* + * Number of currently running deduplication operations that have a + * destination inode belonging to this root. Protected by the lock + * root_item_lock. + */ + int dedupe_in_progress; struct btrfs_subvolume_writers *subv_writers; atomic_t will_be_snapshotted; atomic_t snapshot_force_cow; --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3260,6 +3260,19 @@ static int btrfs_extent_same(struct inod { int ret; u64 i, tail_len, chunk_count; + struct btrfs_root *root_dst = BTRFS_I(dst)->root; + + spin_lock(&root_dst->root_item_lock); + if (root_dst->send_in_progress) { + btrfs_warn_rl(root_dst->fs_info, +"cannot deduplicate to root %llu while send operations are using it (%d in progress)", + root_dst->root_key.objectid, + root_dst->send_in_progress); + spin_unlock(&root_dst->root_item_lock); + return -EAGAIN; + } + root_dst->dedupe_in_progress++; + spin_unlock(&root_dst->root_item_lock);
tail_len = olen % BTRFS_MAX_DEDUPE_LEN; chunk_count = div_u64(olen, BTRFS_MAX_DEDUPE_LEN); @@ -3268,7 +3281,7 @@ static int btrfs_extent_same(struct inod ret = btrfs_extent_same_range(src, loff, BTRFS_MAX_DEDUPE_LEN, dst, dst_loff); if (ret) - return ret; + goto out;
loff += BTRFS_MAX_DEDUPE_LEN; dst_loff += BTRFS_MAX_DEDUPE_LEN; @@ -3277,6 +3290,10 @@ static int btrfs_extent_same(struct inod if (tail_len > 0) ret = btrfs_extent_same_range(src, loff, tail_len, dst, dst_loff); +out: + spin_lock(&root_dst->root_item_lock); + root_dst->dedupe_in_progress--; + spin_unlock(&root_dst->root_item_lock);
return ret; } --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -6626,6 +6626,13 @@ static void btrfs_root_dec_send_in_progr spin_unlock(&root->root_item_lock); }
+static void dedupe_in_progress_warn(const struct btrfs_root *root) +{ + btrfs_warn_rl(root->fs_info, +"cannot use root %llu for send while deduplications on it are in progress (%d in progress)", + root->root_key.objectid, root->dedupe_in_progress); +} + long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg) { int ret = 0; @@ -6649,6 +6656,11 @@ long btrfs_ioctl_send(struct file *mnt_f * making it RW. This also protects against deletion. */ spin_lock(&send_root->root_item_lock); + if (btrfs_root_readonly(send_root) && send_root->dedupe_in_progress) { + dedupe_in_progress_warn(send_root); + spin_unlock(&send_root->root_item_lock); + return -EAGAIN; + } send_root->send_in_progress++; spin_unlock(&send_root->root_item_lock);
@@ -6783,6 +6795,13 @@ long btrfs_ioctl_send(struct file *mnt_f ret = -EPERM; goto out; } + if (clone_root->dedupe_in_progress) { + dedupe_in_progress_warn(clone_root); + spin_unlock(&clone_root->root_item_lock); + srcu_read_unlock(&fs_info->subvol_srcu, index); + ret = -EAGAIN; + goto out; + } clone_root->send_in_progress++; spin_unlock(&clone_root->root_item_lock); srcu_read_unlock(&fs_info->subvol_srcu, index); @@ -6817,6 +6836,13 @@ long btrfs_ioctl_send(struct file *mnt_f ret = -EPERM; goto out; } + if (sctx->parent_root->dedupe_in_progress) { + dedupe_in_progress_warn(sctx->parent_root); + spin_unlock(&sctx->parent_root->root_item_lock); + srcu_read_unlock(&fs_info->subvol_srcu, index); + ret = -EAGAIN; + goto out; + } spin_unlock(&sctx->parent_root->root_item_lock);
srcu_read_unlock(&fs_info->subvol_srcu, index);
From: Liang Chen liangchen.linux@gmail.com
commit a4b732a248d12cbdb46999daf0bf288c011335eb upstream.
There is a race between cache device register and cache set unregister. For an already registered cache device, register_bcache will call bch_is_open to iterate through all cachesets and check every cache there. The race occurs if cache_set_free executes at the same time and clears the caches right before ca is dereferenced in bch_is_open_cache. To close the race, let's make sure the clean up work is protected by the bch_register_lock as well.
This issue can be reproduced as follows, while true; do echo /dev/XXX> /sys/fs/bcache/register ; done& while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done &
and results in the following oops,
[ +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998 [ +0.000457] #PF error: [normal kernel read fault] [ +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0 [ +0.000388] Oops: 0000 [#1] SMP PTI [ +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6 [ +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 [ +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache] [ +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d [ +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202 [ +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000 [ +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001 [ +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a [ +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0 [ +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620 [ +0.000384] FS: 00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000 [ +0.000420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0 [ +0.000837] Call Trace: [ +0.000682] ? _cond_resched+0x10/0x20 [ +0.000691] ? __kmalloc+0x131/0x1b0 [ +0.000710] kernfs_fop_write+0xfa/0x170 [ +0.000733] __vfs_write+0x2e/0x190 [ +0.000688] ? inode_security+0x10/0x30 [ +0.000698] ? selinux_file_permission+0xd2/0x120 [ +0.000752] ? security_file_permission+0x2b/0x100 [ +0.000753] vfs_write+0xa8/0x1a0 [ +0.000676] ksys_write+0x4d/0xb0 [ +0.000699] do_syscall_64+0x3a/0xf0 [ +0.000692] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Liang Chen liangchen.linux@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Coly Li colyli@suse.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1516,6 +1516,7 @@ static void cache_set_free(struct closur bch_btree_cache_free(c); bch_journal_free(c);
+ mutex_lock(&bch_register_lock); for_each_cache(ca, c, i) if (ca) { ca->set = NULL; @@ -1534,7 +1535,6 @@ static void cache_set_free(struct closur mempool_exit(&c->search); kfree(c->devices);
- mutex_lock(&bch_register_lock); list_del(&c->list); mutex_unlock(&bch_register_lock);
From: Coly Li colyli@suse.de
commit 1bee2addc0c8470c8aaa65ef0599eeae96dd88bc upstream.
In journal_reclaim() ja->cur_idx of each cache will be update to reclaim available journal buckets. Variable 'int n' is used to count how many cache is successfully reclaimed, then n is set to c->journal.key by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache() loop will write the jset data onto each cache.
The problem is, if all jouranl buckets on each cache is full, the following code in journal_reclaim(),
529 for_each_cache(ca, c, iter) { 530 struct journal_device *ja = &ca->journal; 531 unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets; 532 533 /* No space available on this device */ 534 if (next == ja->discard_idx) 535 continue; 536 537 ja->cur_idx = next; 538 k->ptr[n++] = MAKE_PTR(0, 539 bucket_to_sector(c, ca->sb.d[ja->cur_idx]), 540 ca->sb.nr_this_dev); 541 } 542 543 bkey_init(k); 544 SET_KEY_PTRS(k, n);
If there is no available bucket to reclaim, the if() condition at line 534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS() will set KEY_PTRS field of c->journal.key to 0.
Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in journal_write_unlocked() the journal data is written in following loop,
649 for (i = 0; i < KEY_PTRS(k); i++) { 650-671 submit journal data to cache device 672 }
If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data won't be written to cache device here. If system crahed or rebooted before bkeys of the lost journal entries written into btree nodes, data corruption will be reported during bcache reload after rebooting the system.
Indeed there is only one cache in a cache set, there is no need to set KEY_PTRS field in journal_reclaim() at all. But in order to keep the for_each_cache() logic consistent for now, this patch fixes the above problem by not setting 0 KEY_PTRS of journal key, if there is no bucket available to reclaim.
Signed-off-by: Coly Li colyli@suse.de Reviewed-by: Hannes Reinecke hare@suse.com Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/bcache/journal.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -540,11 +540,11 @@ static void journal_reclaim(struct cache ca->sb.nr_this_dev); }
- bkey_init(k); - SET_KEY_PTRS(k, n); - - if (n) + if (n) { + bkey_init(k); + SET_KEY_PTRS(k, n); c->journal.blocks_free = c->sb.bucket_size >> c->block_bits; + } out: if (!journal_full(&c->journal)) __closure_wake_up(&c->journal.wait); @@ -671,6 +671,9 @@ static void journal_write_unlocked(struc ca->journal.seq[ca->journal.cur_idx] = w->data->seq; }
+ /* If KEY_PTRS(k) == 0, this jset gets lost in air */ + BUG_ON(i == 0); + atomic_dec_bug(&fifo_back(&c->journal.pin)); bch_journal_next(&c->journal); journal_reclaim(c);
From: Corey Minyard cminyard@mvista.com
commit d73236383eb1cd4b7b65c33a09f0ed45f6781f40 upstream.
This is required for SSIF to work.
There was no way to know if the interface being added was SI or SSIF from the platform data, but that was required so the i2c-addr is only added for SSIF interfaces. So add a field for that.
Also rework the logic a bit so that ipmi-type is not set for SSIF interfaces, as it is not necessary for that.
Fixes: 3cd83bac481d ("ipmi: Consolidate the adding of platform devices") Reported-by: Kamlakant Patel kamlakantp@marvell.com Signed-off-by: Corey Minyard cminyard@mvista.com Cc: stable@vger.kernel.org # 5.1 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/ipmi/ipmi_dmi.c | 2 ++ drivers/char/ipmi/ipmi_plat_data.c | 27 +++++++++++++++------------ drivers/char/ipmi/ipmi_plat_data.h | 3 +++ drivers/char/ipmi/ipmi_si_hardcode.c | 1 + drivers/char/ipmi/ipmi_si_hotmod.c | 1 + 5 files changed, 22 insertions(+), 12 deletions(-)
--- a/drivers/char/ipmi/ipmi_dmi.c +++ b/drivers/char/ipmi/ipmi_dmi.c @@ -47,9 +47,11 @@ static void __init dmi_add_platform_ipmi memset(&p, 0, sizeof(p));
name = "dmi-ipmi-si"; + p.iftype = IPMI_PLAT_IF_SI; switch (type) { case IPMI_DMI_TYPE_SSIF: name = "dmi-ipmi-ssif"; + p.iftype = IPMI_PLAT_IF_SSIF; p.type = SI_TYPE_INVALID; break; case IPMI_DMI_TYPE_BT: --- a/drivers/char/ipmi/ipmi_plat_data.c +++ b/drivers/char/ipmi/ipmi_plat_data.c @@ -12,7 +12,7 @@ struct platform_device *ipmi_platform_ad struct ipmi_plat_data *p) { struct platform_device *pdev; - unsigned int num_r = 1, size, pidx = 0; + unsigned int num_r = 1, size = 0, pidx = 0; struct resource r[4]; struct property_entry pr[6]; u32 flags; @@ -21,19 +21,22 @@ struct platform_device *ipmi_platform_ad memset(pr, 0, sizeof(pr)); memset(r, 0, sizeof(r));
- if (p->type == SI_BT) - size = 3; - else if (p->type == SI_TYPE_INVALID) - size = 0; - else - size = 2; + if (p->iftype == IPMI_PLAT_IF_SI) { + if (p->type == SI_BT) + size = 3; + else if (p->type != SI_TYPE_INVALID) + size = 2;
- if (p->regsize == 0) - p->regsize = DEFAULT_REGSIZE; - if (p->regspacing == 0) - p->regspacing = p->regsize; + if (p->regsize == 0) + p->regsize = DEFAULT_REGSIZE; + if (p->regspacing == 0) + p->regspacing = p->regsize; + + pr[pidx++] = PROPERTY_ENTRY_U8("ipmi-type", p->type); + } else if (p->iftype == IPMI_PLAT_IF_SSIF) { + pr[pidx++] = PROPERTY_ENTRY_U16("i2c-addr", p->addr); + }
- pr[pidx++] = PROPERTY_ENTRY_U8("ipmi-type", p->type); if (p->slave_addr) pr[pidx++] = PROPERTY_ENTRY_U8("slave-addr", p->slave_addr); pr[pidx++] = PROPERTY_ENTRY_U8("addr-source", p->addr_source); --- a/drivers/char/ipmi/ipmi_plat_data.h +++ b/drivers/char/ipmi/ipmi_plat_data.h @@ -6,7 +6,10 @@
#include <linux/ipmi.h>
+enum ipmi_plat_interface_type { IPMI_PLAT_IF_SI, IPMI_PLAT_IF_SSIF }; + struct ipmi_plat_data { + enum ipmi_plat_interface_type iftype; unsigned int type; /* si_type for si, SI_INVALID for others */ unsigned int space; /* addr_space for si, intf# for ssif. */ unsigned long addr; --- a/drivers/char/ipmi/ipmi_si_hardcode.c +++ b/drivers/char/ipmi/ipmi_si_hardcode.c @@ -83,6 +83,7 @@ static void __init ipmi_hardcode_init_on
memset(&p, 0, sizeof(p));
+ p.iftype = IPMI_PLAT_IF_SI; if (!si_type_str || !*si_type_str || strcmp(si_type_str, "kcs") == 0) { p.type = SI_KCS; } else if (strcmp(si_type_str, "smic") == 0) { --- a/drivers/char/ipmi/ipmi_si_hotmod.c +++ b/drivers/char/ipmi/ipmi_si_hotmod.c @@ -108,6 +108,7 @@ static int parse_hotmod_str(const char * int rv; unsigned int ival;
+ h->iftype = IPMI_PLAT_IF_SI; rv = parse_str(hotmod_ops, &ival, "operation", &curr); if (rv) return rv;
From: Kamlakant Patel kamlakantp@marvell.com
commit 55be8658c7e2feb11a5b5b33ee031791dbd23a69 upstream.
According to ipmi spec, block number is a number that is incremented, starting with 0, for each new block of message data returned using the middle transaction.
Here, the 'blocknum' is data[0] which always starts from zero(0) and 'ssif_info->multi_pos' starts from 1. So, we need to add +1 to blocknum while comparing with multi_pos.
Fixes: 7d6380cd40f79 ("ipmi:ssif: Fix handling of multi-part return messages"). Reported-by: Kiran Kolukuluru kirank@ami.com Signed-off-by: Kamlakant Patel kamlakantp@marvell.com Message-Id: 1556106615-18722-1-git-send-email-kamlakantp@marvell.com [Also added a debug log if the block numbers don't match.] Signed-off-by: Corey Minyard cminyard@mvista.com Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/ipmi/ipmi_ssif.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -727,12 +727,16 @@ static void msg_done_handler(struct ssif /* End of read */ len = ssif_info->multi_len; data = ssif_info->data; - } else if (blocknum != ssif_info->multi_pos) { + } else if (blocknum + 1 != ssif_info->multi_pos) { /* * Out of sequence block, just abort. Block * numbers start at zero for the second block, * but multi_pos starts at one, so the +1. */ + if (ssif_info->ssif_debug & SSIF_DEBUG_MSG) + dev_dbg(&ssif_info->client->dev, + "Received message out of sequence, expected %u, got %u\n", + ssif_info->multi_pos - 1, blocknum); result = -EIO; } else { ssif_inc_stat(ssif_info, received_message_parts);
From: Fabio Estevam festevam@gmail.com
commit 0672d22a19244cdb0e5c753125c1a55a120db5d0 upstream.
Commit 6d4cd041f0af ("net: phy: at803x: disable delay only for RGMII mode") exposed an issue on imx DTS files using AR8031/AR8035 PHYs.
The end result is that the boards can no longer obtain an IP address via UDHCP, for example.
Quoting Andrew Lunn:
"The problem here is, all the DTs were broken since day 0. However, because the PHY driver was also broken, nobody noticed and it worked. Now that the PHY driver has been fixed, all the bugs in the DTs now become an issue"
To fix this problem, the phy-mode property needs to be "rgmii-id", which has the following meaning as per Documentation/devicetree/bindings/net/ethernet.txt:
"RGMII with internal RX and TX delays provided by the PHY, the MAC should not add the RX or TX delays in this case)"
Tested on imx6-sabresd, imx6sx-sdb and imx7d-pico boards with successfully restored networking.
Based on the initial submission from Steve Twiss for the imx6qdl-sabresd.
Signed-off-by: Fabio Estevam festevam@gmail.com Tested-by: Baruch Siach baruch@tkos.co.il Tested-by: Soeren Moch smoch@web.de Tested-by: Steve Twiss stwiss.opensource@diasemi.com Tested-by: Adam Thomson Adam.Thomson@diasemi.com Signed-off-by: Steve Twiss stwiss.opensource@diasemi.com Tested-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Shawn Guo shawnguo@kernel.org Cc: "George G. Davis" george_davis@mentor.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/imx6-logicpd-baseboard.dtsi | 2 +- arch/arm/boot/dts/imx6dl-riotboard.dts | 2 +- arch/arm/boot/dts/imx6q-ba16.dtsi | 2 +- arch/arm/boot/dts/imx6q-marsboard.dts | 2 +- arch/arm/boot/dts/imx6q-tbs2910.dts | 2 +- arch/arm/boot/dts/imx6qdl-apf6.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-sabreauto.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-sr-som.dtsi | 2 +- arch/arm/boot/dts/imx6qdl-wandboard.dtsi | 2 +- arch/arm/boot/dts/imx6sx-sabreauto.dts | 2 +- arch/arm/boot/dts/imx6sx-sdb.dtsi | 2 +- arch/arm/boot/dts/imx7d-pico.dtsi | 2 +- 13 files changed, 13 insertions(+), 13 deletions(-)
--- a/arch/arm/boot/dts/imx6-logicpd-baseboard.dtsi +++ b/arch/arm/boot/dts/imx6-logicpd-baseboard.dtsi @@ -216,7 +216,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-reset-duration = <10>; phy-reset-gpios = <&gpio1 24 GPIO_ACTIVE_LOW>; phy-supply = <®_enet>; --- a/arch/arm/boot/dts/imx6dl-riotboard.dts +++ b/arch/arm/boot/dts/imx6dl-riotboard.dts @@ -92,7 +92,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-reset-gpios = <&gpio3 31 GPIO_ACTIVE_LOW>; interrupts-extended = <&gpio1 6 IRQ_TYPE_LEVEL_HIGH>, <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>; --- a/arch/arm/boot/dts/imx6q-ba16.dtsi +++ b/arch/arm/boot/dts/imx6q-ba16.dtsi @@ -171,7 +171,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; status = "okay"; };
--- a/arch/arm/boot/dts/imx6q-marsboard.dts +++ b/arch/arm/boot/dts/imx6q-marsboard.dts @@ -110,7 +110,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-reset-gpios = <&gpio3 31 GPIO_ACTIVE_LOW>; status = "okay"; }; --- a/arch/arm/boot/dts/imx6q-tbs2910.dts +++ b/arch/arm/boot/dts/imx6q-tbs2910.dts @@ -98,7 +98,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-reset-gpios = <&gpio1 25 GPIO_ACTIVE_LOW>; status = "okay"; }; --- a/arch/arm/boot/dts/imx6qdl-apf6.dtsi +++ b/arch/arm/boot/dts/imx6qdl-apf6.dtsi @@ -51,7 +51,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-reset-duration = <10>; phy-reset-gpios = <&gpio1 24 GPIO_ACTIVE_LOW>; status = "okay"; --- a/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi +++ b/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi @@ -292,7 +292,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; interrupts-extended = <&gpio1 6 IRQ_TYPE_LEVEL_HIGH>, <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>; fsl,err006687-workaround-present; --- a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi +++ b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi @@ -202,7 +202,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-reset-gpios = <&gpio1 25 GPIO_ACTIVE_LOW>; status = "okay"; }; --- a/arch/arm/boot/dts/imx6qdl-sr-som.dtsi +++ b/arch/arm/boot/dts/imx6qdl-sr-som.dtsi @@ -53,7 +53,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_microsom_enet_ar8035>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-reset-duration = <2>; phy-reset-gpios = <&gpio4 15 GPIO_ACTIVE_LOW>; status = "okay"; --- a/arch/arm/boot/dts/imx6qdl-wandboard.dtsi +++ b/arch/arm/boot/dts/imx6qdl-wandboard.dtsi @@ -224,7 +224,7 @@ &fec { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-reset-gpios = <&gpio3 29 GPIO_ACTIVE_LOW>; interrupts-extended = <&gpio1 6 IRQ_TYPE_LEVEL_HIGH>, <&intc 0 119 IRQ_TYPE_LEVEL_HIGH>; --- a/arch/arm/boot/dts/imx6sx-sabreauto.dts +++ b/arch/arm/boot/dts/imx6sx-sabreauto.dts @@ -75,7 +75,7 @@ &fec1 { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet1>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-handle = <ðphy1>; fsl,magic-packet; status = "okay"; --- a/arch/arm/boot/dts/imx6sx-sdb.dtsi +++ b/arch/arm/boot/dts/imx6sx-sdb.dtsi @@ -191,7 +191,7 @@ pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enet1>; phy-supply = <®_enet_3v3>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-handle = <ðphy1>; phy-reset-gpios = <&gpio2 7 GPIO_ACTIVE_LOW>; status = "okay"; --- a/arch/arm/boot/dts/imx7d-pico.dtsi +++ b/arch/arm/boot/dts/imx7d-pico.dtsi @@ -92,7 +92,7 @@ <&clks IMX7D_ENET1_TIME_ROOT_CLK>; assigned-clock-parents = <&clks IMX7D_PLL_ENET_MAIN_100M_CLK>; assigned-clock-rates = <0>, <100000000>; - phy-mode = "rgmii"; + phy-mode = "rgmii-id"; phy-handle = <ðphy0>; fsl,magic-packet; phy-reset-gpios = <&gpio6 11 GPIO_ACTIVE_LOW>;
From: Mel Gorman mgorman@techsingularity.net
commit 60fce36afa9c77c7ccbf980c4f670f3be3651fce upstream.
syzbot reported the following error from a tree with a head commit of baf76f0c58ae ("slip: make slhc_free() silently accept an error pointer")
BUG: unable to handle kernel paging request at ffffea0003348000 #PF error: [normal kernel read fault] PGD 12c3f9067 P4D 12c3f9067 PUD 12c3f8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 28916 Comm: syz-executor.2 Not tainted 5.1.0-rc6+ #89 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:314 [inline] RIP: 0010:PageCompound include/linux/page-flags.h:186 [inline] RIP: 0010:isolate_freepages_block+0x1c0/0xd40 mm/compaction.c:579 Code: 01 d8 ff 4d 85 ed 0f 84 ef 07 00 00 e8 29 00 d8 ff 4c 89 e0 83 85 38 ff ff ff 01 48 c1 e8 03 42 80 3c 38 00 0f 85 31 0a 00 00 <4d> 8b 2c 24 31 ff 49 c1 ed 10 41 83 e5 01 44 89 ee e8 3a 01 d8 ff RSP: 0018:ffff88802b31eab8 EFLAGS: 00010246 RAX: 1ffffd4000669000 RBX: 00000000000cd200 RCX: ffffc9000a235000 RDX: 000000000001ca5e RSI: ffffffff81988cc7 RDI: 0000000000000001 RBP: ffff88802b31ebd8 R08: ffff88805af700c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0003348000 R13: 0000000000000000 R14: ffff88802b31f030 R15: dffffc0000000000 FS: 00007f61648dc700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0003348000 CR3: 0000000037c64000 CR4: 00000000001426e0 Call Trace: fast_isolate_around mm/compaction.c:1243 [inline] fast_isolate_freepages mm/compaction.c:1418 [inline] isolate_freepages mm/compaction.c:1438 [inline] compaction_alloc+0x1aee/0x22e0 mm/compaction.c:1550
There is no reproducer and it is difficult to hit -- 1 crash every few days. The issue is very similar to the fix in commit 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints"). When isolating free pages around a target pageblock, the boundary handling is off by one and can stray into the next pageblock. Triggering the syzbot error requires that the end of pageblock is section or zone aligned, and that the next section is unpopulated.
A more subtle consequence of the bug is that pageblocks were being improperly used as migration targets which potentially hurts fragmentation avoidance in the long-term one page at a time.
A debugging patch revealed that it's definitely possible to stray outside of a pageblock which is not intended. While syzbot cannot be used to verify this patch, it was confirmed that the debugging warning no longer triggers with this patch applied. It has also been confirmed that the THP allocation stress tests are not degraded by this patch.
Link: http://lkml.kernel.org/r/20190510182124.GI18914@techsingularity.net Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints") Signed-off-by: Mel Gorman mgorman@techsingularity.net Reported-by: syzbot+d84c80f9fe26a0f7a734@syzkaller.appspotmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Qian Cai cai@lca.pw Cc: Michal Hocko mhocko@suse.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/compaction.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/compaction.c +++ b/mm/compaction.c @@ -1228,7 +1228,7 @@ fast_isolate_around(struct compact_contr
/* Pageblock boundaries */ start_pfn = pageblock_start_pfn(pfn); - end_pfn = min(start_pfn + pageblock_nr_pages, zone_end_pfn(cc->zone)); + end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone)) - 1;
/* Scan before */ if (start_pfn != pfn) { @@ -1239,7 +1239,7 @@ fast_isolate_around(struct compact_contr
/* Scan after */ start_pfn = pfn + nr_isolated; - if (start_pfn != end_pfn) + if (start_pfn < end_pfn) isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
/* Skip this pageblock in the future as it's full or nearly full */
From: Jiufei Xue jiufei.xue@linux.alibaba.com
commit ec084de929e419e51bcdafaafe567d9e7d0273b7 upstream.
synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb switch may not go to the workqueue after synchronize_rcu(). Thus previous scheduled switches was not finished even flushing the workqueue, which will cause a NULL pointer dereferenced followed below.
VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day... BUG: unable to handle kernel NULL pointer dereference at 0000000000000278 evict+0xb3/0x180 iput+0x1b0/0x230 inode_switch_wbs_work_fn+0x3c0/0x6a0 worker_thread+0x4e/0x490 ? process_one_work+0x410/0x410 kthread+0xe6/0x100 ret_from_fork+0x39/0x50
Replace the synchronize_rcu() call with a rcu_barrier() to wait for all pending callbacks to finish. And inc isw_nr_in_flight after call_rcu() in inode_switch_wbs() to make more sense.
Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com Signed-off-by: Jiufei Xue jiufei.xue@linux.alibaba.com Acked-by: Tejun Heo tj@kernel.org Suggested-by: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fs-writeback.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -523,8 +523,6 @@ static void inode_switch_wbs(struct inod
isw->inode = inode;
- atomic_inc(&isw_nr_in_flight); - /* * In addition to synchronizing among switchers, I_WB_SWITCH tells * the RCU protected stat update paths to grab the i_page @@ -532,6 +530,9 @@ static void inode_switch_wbs(struct inod * Let's continue after I_WB_SWITCH is guaranteed to be visible. */ call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn); + + atomic_inc(&isw_nr_in_flight); + goto out_unlock;
out_free: @@ -901,7 +902,11 @@ restart: void cgroup_writeback_umount(void) { if (atomic_read(&isw_nr_in_flight)) { - synchronize_rcu(); + /* + * Use rcu_barrier() to wait for all pending callbacks to + * ensure that all in-flight wb switches are in the workqueue. + */ + rcu_barrier(); flush_workqueue(isw_wq); } }
From: Anup Patel Anup.Patel@wdc.com
commit f91253a3d005796404ae0e578b3394459b5f9b71 upstream.
The Linux kernel will auto-disables all boot consoles whenever it gets a preferred real console.
Currently on RISC-V systems, if we have a real console which is not RISCV SBI console then boot consoles (such as earlycon=sbi) are not auto-disabled when a real console (ttyS0 or ttySIF0) is available. This results in duplicate prints at boot-time after kernel starts using real console (i.e. ttyS0 or ttySIF0) if "earlycon=" kernel parameter was passed by bootloader.
The reason for above issue is that RISCV SBI console always adds itself as preferred console which is causing other real consoles to be not used as preferred console.
Ideally "console=" kernel parameter passed by bootloaders should be the one selecting a preferred real console.
This patch fixes above issue by not forcing RISCV SBI console as preferred console.
Fixes: afa6b1ccfad5 ("tty: New RISC-V SBI console driver") Cc: stable@vger.kernel.org Signed-off-by: Anup Patel anup.patel@wdc.com Reviewed-by: Atish Patra atish.patra@wdc.com Signed-off-by: Palmer Dabbelt palmer@sifive.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/hvc/hvc_riscv_sbi.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/tty/hvc/hvc_riscv_sbi.c +++ b/drivers/tty/hvc/hvc_riscv_sbi.c @@ -53,7 +53,6 @@ device_initcall(hvc_sbi_init); static int __init hvc_sbi_console_init(void) { hvc_instantiate(0, 0, &hvc_sbi_ops); - add_preferred_console("hvc", 0, NULL);
return 0; }
From: Sriram Rajagopalan sriramr@arista.com
commit 592acbf16821288ecdc4192c47e3774a4c48bb64 upstream.
This commit zeroes out the unused memory region in the buffer_head corresponding to the extent metablock after writing the extent header and the corresponding extent node entries.
This is done to prevent random uninitialized data from getting into the filesystem when the extent block is synced.
This fixes CVE-2019-11833.
Signed-off-by: Sriram Rajagopalan sriramr@arista.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/extents.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
--- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1035,6 +1035,7 @@ static int ext4_ext_split(handle_t *hand __le32 border; ext4_fsblk_t *ablocks = NULL; /* array of allocated blocks */ int err = 0; + size_t ext_size = 0;
/* make decision: where to split? */ /* FIXME: now decision is simplest: at current extent */ @@ -1126,6 +1127,10 @@ static int ext4_ext_split(handle_t *hand le16_add_cpu(&neh->eh_entries, m); }
+ /* zero out unused area in the extent block */ + ext_size = sizeof(struct ext4_extent_header) + + sizeof(struct ext4_extent) * le16_to_cpu(neh->eh_entries); + memset(bh->b_data + ext_size, 0, inode->i_sb->s_blocksize - ext_size); ext4_extent_block_csum_set(inode, neh); set_buffer_uptodate(bh); unlock_buffer(bh); @@ -1205,6 +1210,11 @@ static int ext4_ext_split(handle_t *hand sizeof(struct ext4_extent_idx) * m); le16_add_cpu(&neh->eh_entries, m); } + /* zero out unused area in the extent block */ + ext_size = sizeof(struct ext4_extent_header) + + (sizeof(struct ext4_extent) * le16_to_cpu(neh->eh_entries)); + memset(bh->b_data + ext_size, 0, + inode->i_sb->s_blocksize - ext_size); ext4_extent_block_csum_set(inode, neh); set_buffer_uptodate(bh); unlock_buffer(bh); @@ -1270,6 +1280,7 @@ static int ext4_ext_grow_indepth(handle_ ext4_fsblk_t newblock, goal = 0; struct ext4_super_block *es = EXT4_SB(inode->i_sb)->s_es; int err = 0; + size_t ext_size = 0;
/* Try to prepend new index to old one */ if (ext_depth(inode)) @@ -1295,9 +1306,11 @@ static int ext4_ext_grow_indepth(handle_ goto out; }
+ ext_size = sizeof(EXT4_I(inode)->i_data); /* move top-level index/leaf into new block */ - memmove(bh->b_data, EXT4_I(inode)->i_data, - sizeof(EXT4_I(inode)->i_data)); + memmove(bh->b_data, EXT4_I(inode)->i_data, ext_size); + /* zero out unused area in the extent block */ + memset(bh->b_data + ext_size, 0, inode->i_sb->s_blocksize - ext_size);
/* set size of new block */ neh = ext_block_hdr(bh);
From: Lukas Czerner lczerner@redhat.com
commit 57a0da28ced8707cb9f79f071a016b9d005caf5a upstream.
Unaligned AIO must be serialized because the zeroing of partial blocks of unaligned AIO can result in data corruption in case it's overlapping another in flight IO.
Currently we wait for all unwritten extents before we submit unaligned AIO which protects data in case of unaligned AIO is following overlapping IO. However if a unaligned AIO is followed by overlapping aligned AIO we can still end up corrupting data.
To fix this, we must make sure that the unaligned AIO is the only IO in flight by waiting for unwritten extents conversion not just before the IO submission, but right after it as well.
This problem can be reproduced by xfstest generic/538
Signed-off-by: Lukas Czerner lczerner@redhat.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/file.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -264,6 +264,13 @@ ext4_file_write_iter(struct kiocb *iocb, }
ret = __generic_file_write_iter(iocb, from); + /* + * Unaligned direct AIO must be the only IO in flight. Otherwise + * overlapping aligned IO after unaligned might result in data + * corruption. + */ + if (ret == -EIOCBQUEUED && unaligned_aio) + ext4_unwritten_wait(inode); inode_unlock(inode);
if (ret > 0)
From: Sahitya Tummala stummala@codeaurora.org
commit 08fc98a4d6424af66eb3ac4e2cedd2fc927ed436 upstream.
The buffer_head (frames[0].bh) and it's corresping page can be potentially free'd once brelse() is done inside the for loop but before the for loop exits in dx_release(). It can be free'd in another context, when the page cache is flushed via drop_caches_sysctl_handler(). This results into below data abort when accessing info->indirect_levels in dx_release().
Unable to handle kernel paging request at virtual address ffffffc17ac3e01e Call trace: dx_release+0x70/0x90 ext4_htree_fill_tree+0x2d4/0x300 ext4_readdir+0x244/0x6f8 iterate_dir+0xbc/0x160 SyS_getdents64+0x94/0x174
Signed-off-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Andreas Dilger adilger@dilger.ca Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/namei.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -871,12 +871,15 @@ static void dx_release(struct dx_frame * { struct dx_root_info *info; int i; + unsigned int indirect_levels;
if (frames[0].bh == NULL) return;
info = &((struct dx_root *)frames[0].bh->b_data)->info; - for (i = 0; i <= info->indirect_levels; i++) { + /* save local copy, "info" may be freed after brelse() */ + indirect_levels = info->indirect_levels; + for (i = 0; i <= indirect_levels; i++) { if (frames[i].bh == NULL) break; brelse(frames[i].bh);
From: Jan Kara jack@suse.cz
commit 2c1d0e3631e5732dba98ef49ac0bec1388776793 upstream.
Handling of aborted journal is a special code path different from standard ext4_error() one and it can call panic() as well. Commit 1dc1097ff60e ("ext4: avoid panic during forced reboot") forgot to update this path so fix that omission.
Fixes: 1dc1097ff60e ("ext4: avoid panic during forced reboot") Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org # 5.1 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -698,7 +698,7 @@ void __ext4_abort(struct super_block *sb jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO); save_error_info(sb, function, line); } - if (test_opt(sb, ERRORS_PANIC)) { + if (test_opt(sb, ERRORS_PANIC) && !system_going_down()) { if (EXT4_SB(sb)->s_journal && !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR)) return;
From: Jeremy Soller jeremy@system76.com
commit 891afcf2462d2cc4ef7caf94215358ca61fa32cb upstream.
A mistake was made in the identification of the four variants of the System76 Gazelle (gaze14). This patch corrects the PCI ID of the 17-inch, GTX 1660 Ti variant from 0x8560 to 0x8551. This patch also adds the correct fixups for the 15-inch and 17-inch GTX 1650 variants with PCI IDs 0x8560 and 0x8561.
Tests were done on all four variants ensuring full audio capability.
Fixes: 80a5052db751 ("ALSA: hdea/realtek - Headset fixup for System76 Gazelle (gaze14)") Signed-off-by: Jeremy Soller jeremy@system76.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6933,7 +6933,9 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1462, 0xb171, "Cubi N 8GL (MS-B171)", ALC283_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1558, 0x1325, "System76 Darter Pro (darp5)", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x8550, "System76 Gazelle (gaze14)", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), - SND_PCI_QUIRK(0x1558, 0x8560, "System76 Gazelle (gaze14)", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x8551, "System76 Gazelle (gaze14)", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x8560, "System76 Gazelle (gaze14)", ALC269_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x1558, 0x8561, "System76 Gazelle (gaze14)", ALC269_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x17aa, 0x1036, "Lenovo P520", ALC233_FIXUP_LENOVO_MULTI_CODECS), SND_PCI_QUIRK(0x17aa, 0x20f2, "Thinkpad SL410/510", ALC269_FIXUP_SKU_IGNORE), SND_PCI_QUIRK(0x17aa, 0x215e, "Thinkpad L512", ALC269_FIXUP_SKU_IGNORE),
From: Kailang Yang kailang@realtek.com
commit dad3197da7a3817f27bb24f7fd3c135ffa707202 upstream.
Dell platform with ALC298. system enter to runtime suspend. Headphone had noise. Let Headset Mic not shutup will solve this issue.
[ Fixed minor coding style issues by tiwai ]
Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 59 ++++++++++++++++++++++++------------------ 1 file changed, 35 insertions(+), 24 deletions(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -477,12 +477,45 @@ static void alc_auto_setup_eapd(struct h set_eapd(codec, *p, on); }
+static int find_ext_mic_pin(struct hda_codec *codec); + +static void alc_headset_mic_no_shutup(struct hda_codec *codec) +{ + const struct hda_pincfg *pin; + int mic_pin = find_ext_mic_pin(codec); + int i; + + /* don't shut up pins when unloading the driver; otherwise it breaks + * the default pin setup at the next load of the driver + */ + if (codec->bus->shutdown) + return; + + snd_array_for_each(&codec->init_pins, i, pin) { + /* use read here for syncing after issuing each verb */ + if (pin->nid != mic_pin) + snd_hda_codec_read(codec, pin->nid, 0, + AC_VERB_SET_PIN_WIDGET_CONTROL, 0); + } + + codec->pins_shutup = 1; +} + static void alc_shutup_pins(struct hda_codec *codec) { struct alc_spec *spec = codec->spec;
- if (!spec->no_shutup_pins) - snd_hda_shutup_pins(codec); + switch (codec->core.vendor_id) { + case 0x10ec0286: + case 0x10ec0288: + case 0x10ec0298: + alc_headset_mic_no_shutup(codec); + break; + default: + if (!spec->no_shutup_pins) + snd_hda_shutup_pins(codec); + break; + } }
/* generic shutup callback; @@ -2923,27 +2956,6 @@ static int alc269_parse_auto_config(stru return alc_parse_auto_config(codec, alc269_ignore, ssids); }
-static int find_ext_mic_pin(struct hda_codec *codec); - -static void alc286_shutup(struct hda_codec *codec) -{ - const struct hda_pincfg *pin; - int i; - int mic_pin = find_ext_mic_pin(codec); - /* don't shut up pins when unloading the driver; otherwise it breaks - * the default pin setup at the next load of the driver - */ - if (codec->bus->shutdown) - return; - snd_array_for_each(&codec->init_pins, i, pin) { - /* use read here for syncing after issuing each verb */ - if (pin->nid != mic_pin) - snd_hda_codec_read(codec, pin->nid, 0, - AC_VERB_SET_PIN_WIDGET_CONTROL, 0); - } - codec->pins_shutup = 1; -} - static void alc269vb_toggle_power_output(struct hda_codec *codec, int power_up) { alc_update_coef_idx(codec, 0x04, 1 << 11, power_up ? (1 << 11) : 0); @@ -7707,7 +7719,6 @@ static int patch_alc269(struct hda_codec case 0x10ec0286: case 0x10ec0288: spec->codec_variant = ALC269_TYPE_ALC286; - spec->shutup = alc286_shutup; break; case 0x10ec0298: spec->codec_variant = ALC269_TYPE_ALC298;
From: Michał Wadowski wadosm@gmail.com
commit 56df90b631fc027fe28b70d41352d820797239bb upstream.
Add patch for realtek codec in Lenovo B50-70 that fixes inverted internal microphone channel. Device IdeaPad Y410P has the same PCI SSID as Lenovo B50-70, but first one is about fix the noise and it didn't seem help in a later kernel version. So I replaced IdeaPad Y410P device description with B50-70 and apply inverted microphone fix.
Bugzilla: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1524215 Signed-off-by: Michał Wadowski wadosm@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6990,7 +6990,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x17aa, 0x313c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION), SND_PCI_QUIRK(0x17aa, 0x3902, "Lenovo E50-80", ALC269_FIXUP_DMIC_THINKPAD_ACPI), SND_PCI_QUIRK(0x17aa, 0x3977, "IdeaPad S210", ALC283_FIXUP_INT_MIC), - SND_PCI_QUIRK(0x17aa, 0x3978, "IdeaPad Y410P", ALC269_FIXUP_NO_SHUTUP), + SND_PCI_QUIRK(0x17aa, 0x3978, "Lenovo B50-70", ALC269_FIXUP_DMIC_THINKPAD_ACPI), SND_PCI_QUIRK(0x17aa, 0x5013, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x17aa, 0x501a, "Thinkpad", ALC283_FIXUP_INT_MIC), SND_PCI_QUIRK(0x17aa, 0x501e, "Thinkpad L440", ALC292_FIXUP_TPT440_DOCK),
From: Chengguang Xu cgxu519@gmail.com
commit 0d52154bb0a700abb459a2cbce0a30fc2549b67e upstream.
When failing from creating cache jbd2_inode_cache, we will destroy the previously created cache jbd2_handle_cache twice. This patch fixes this by moving each cache initialization/destruction to its own separate, individual function.
Signed-off-by: Chengguang Xu cgxu519@gmail.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jbd2/journal.c | 49 +++++++++++++++++++++++++++++++------------------ fs/jbd2/revoke.c | 32 ++++++++++++++++++++------------ fs/jbd2/transaction.c | 8 +++++--- include/linux/jbd2.h | 8 +++++--- 4 files changed, 61 insertions(+), 36 deletions(-)
--- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -2375,22 +2375,19 @@ static struct kmem_cache *jbd2_journal_h static atomic_t nr_journal_heads = ATOMIC_INIT(0); #endif
-static int jbd2_journal_init_journal_head_cache(void) +static int __init jbd2_journal_init_journal_head_cache(void) { - int retval; - - J_ASSERT(jbd2_journal_head_cache == NULL); + J_ASSERT(!jbd2_journal_head_cache); jbd2_journal_head_cache = kmem_cache_create("jbd2_journal_head", sizeof(struct journal_head), 0, /* offset */ SLAB_TEMPORARY | SLAB_TYPESAFE_BY_RCU, NULL); /* ctor */ - retval = 0; if (!jbd2_journal_head_cache) { - retval = -ENOMEM; printk(KERN_EMERG "JBD2: no memory for journal_head cache\n"); + return -ENOMEM; } - return retval; + return 0; }
static void jbd2_journal_destroy_journal_head_cache(void) @@ -2636,28 +2633,38 @@ static void __exit jbd2_remove_jbd_stats
struct kmem_cache *jbd2_handle_cache, *jbd2_inode_cache;
+static int __init jbd2_journal_init_inode_cache(void) +{ + J_ASSERT(!jbd2_inode_cache); + jbd2_inode_cache = KMEM_CACHE(jbd2_inode, 0); + if (!jbd2_inode_cache) { + pr_emerg("JBD2: failed to create inode cache\n"); + return -ENOMEM; + } + return 0; +} + static int __init jbd2_journal_init_handle_cache(void) { + J_ASSERT(!jbd2_handle_cache); jbd2_handle_cache = KMEM_CACHE(jbd2_journal_handle, SLAB_TEMPORARY); - if (jbd2_handle_cache == NULL) { + if (!jbd2_handle_cache) { printk(KERN_EMERG "JBD2: failed to create handle cache\n"); return -ENOMEM; } - jbd2_inode_cache = KMEM_CACHE(jbd2_inode, 0); - if (jbd2_inode_cache == NULL) { - printk(KERN_EMERG "JBD2: failed to create inode cache\n"); - kmem_cache_destroy(jbd2_handle_cache); - return -ENOMEM; - } return 0; }
+static void jbd2_journal_destroy_inode_cache(void) +{ + kmem_cache_destroy(jbd2_inode_cache); + jbd2_inode_cache = NULL; +} + static void jbd2_journal_destroy_handle_cache(void) { kmem_cache_destroy(jbd2_handle_cache); jbd2_handle_cache = NULL; - kmem_cache_destroy(jbd2_inode_cache); - jbd2_inode_cache = NULL; }
/* @@ -2668,21 +2675,27 @@ static int __init journal_init_caches(vo { int ret;
- ret = jbd2_journal_init_revoke_caches(); + ret = jbd2_journal_init_revoke_record_cache(); + if (ret == 0) + ret = jbd2_journal_init_revoke_table_cache(); if (ret == 0) ret = jbd2_journal_init_journal_head_cache(); if (ret == 0) ret = jbd2_journal_init_handle_cache(); if (ret == 0) + ret = jbd2_journal_init_inode_cache(); + if (ret == 0) ret = jbd2_journal_init_transaction_cache(); return ret; }
static void jbd2_journal_destroy_caches(void) { - jbd2_journal_destroy_revoke_caches(); + jbd2_journal_destroy_revoke_record_cache(); + jbd2_journal_destroy_revoke_table_cache(); jbd2_journal_destroy_journal_head_cache(); jbd2_journal_destroy_handle_cache(); + jbd2_journal_destroy_inode_cache(); jbd2_journal_destroy_transaction_cache(); jbd2_journal_destroy_slabs(); } --- a/fs/jbd2/revoke.c +++ b/fs/jbd2/revoke.c @@ -178,33 +178,41 @@ static struct jbd2_revoke_record_s *find return NULL; }
-void jbd2_journal_destroy_revoke_caches(void) +void jbd2_journal_destroy_revoke_record_cache(void) { kmem_cache_destroy(jbd2_revoke_record_cache); jbd2_revoke_record_cache = NULL; +} + +void jbd2_journal_destroy_revoke_table_cache(void) +{ kmem_cache_destroy(jbd2_revoke_table_cache); jbd2_revoke_table_cache = NULL; }
-int __init jbd2_journal_init_revoke_caches(void) +int __init jbd2_journal_init_revoke_record_cache(void) { J_ASSERT(!jbd2_revoke_record_cache); - J_ASSERT(!jbd2_revoke_table_cache); - jbd2_revoke_record_cache = KMEM_CACHE(jbd2_revoke_record_s, SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY); - if (!jbd2_revoke_record_cache) - goto record_cache_failure;
+ if (!jbd2_revoke_record_cache) { + pr_emerg("JBD2: failed to create revoke_record cache\n"); + return -ENOMEM; + } + return 0; +} + +int __init jbd2_journal_init_revoke_table_cache(void) +{ + J_ASSERT(!jbd2_revoke_table_cache); jbd2_revoke_table_cache = KMEM_CACHE(jbd2_revoke_table_s, SLAB_TEMPORARY); - if (!jbd2_revoke_table_cache) - goto table_cache_failure; - return 0; -table_cache_failure: - jbd2_journal_destroy_revoke_caches(); -record_cache_failure: + if (!jbd2_revoke_table_cache) { + pr_emerg("JBD2: failed to create revoke_table cache\n"); return -ENOMEM; + } + return 0; }
static struct jbd2_revoke_table_s *jbd2_journal_init_revoke_table(int hash_size) --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -42,9 +42,11 @@ int __init jbd2_journal_init_transaction 0, SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY, NULL); - if (transaction_cache) - return 0; - return -ENOMEM; + if (!transaction_cache) { + pr_emerg("JBD2: failed to create transaction cache\n"); + return -ENOMEM; + } + return 0; }
void jbd2_journal_destroy_transaction_cache(void) --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1318,7 +1318,7 @@ extern void __wait_on_journal (journal_
/* Transaction cache support */ extern void jbd2_journal_destroy_transaction_cache(void); -extern int jbd2_journal_init_transaction_cache(void); +extern int __init jbd2_journal_init_transaction_cache(void); extern void jbd2_journal_free_transaction(transaction_t *);
/* @@ -1446,8 +1446,10 @@ static inline void jbd2_free_inode(struc /* Primary revoke support */ #define JOURNAL_REVOKE_DEFAULT_HASH 256 extern int jbd2_journal_init_revoke(journal_t *, int); -extern void jbd2_journal_destroy_revoke_caches(void); -extern int jbd2_journal_init_revoke_caches(void); +extern void jbd2_journal_destroy_revoke_record_cache(void); +extern void jbd2_journal_destroy_revoke_table_cache(void); +extern int __init jbd2_journal_init_revoke_record_cache(void); +extern int __init jbd2_journal_init_revoke_table_cache(void);
extern void jbd2_journal_destroy_revoke(journal_t *); extern int jbd2_journal_revoke (handle_t *, unsigned long long, struct buffer_head *);
From: Sean Christopherson sean.j.christopherson@intel.com
commit f93f7ede087f2edcc18e4b02310df5749a6b5a61 upstream.
The RDPMC-exiting control is dependent on the existence of the RDPMC instruction itself, i.e. is not tied to the "Architectural Performance Monitoring" feature. For all intents and purposes, the control exists on all CPUs with VMX support since RDPMC also exists on all VCPUs with VMX supported. Per Intel's SDM:
The RDPMC instruction was introduced into the IA-32 Architecture in the Pentium Pro processor and the Pentium processor with MMX technology. The earlier Pentium processors have performance-monitoring counters, but they must be read with the RDMSR instruction.
Because RDPMC-exiting always exists, KVM requires the control and refuses to load if it's not available. As a result, hiding the PMU from a guest breaks nested virtualization if the guest attemts to use KVM.
While it's not explicitly stated in the RDPMC pseudocode, the VM-Exit check for RDPMC-exiting follows standard fault vs. VM-Exit prioritization for privileged instructions, e.g. occurs after the CPL/CR0.PE/CR4.PCE checks, but before the counter referenced in ECX is checked for validity.
In other words, the original KVM behavior of injecting a #GP was correct, and the KVM unit test needs to be adjusted accordingly, e.g. eat the #GP when the unit test guest (L3 in this case) executes RDPMC without RDPMC-exiting set in the unit test host (L2).
This reverts commit e51bfdb68725dc052d16241ace40ea3140f938aa.
Fixes: e51bfdb68725 ("KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU") Reported-by: David Hill hilld@binarystorm.net Cc: Saar Amar saaramar@microsoft.com Cc: Mihai Carabas mihai.carabas@oracle.com Cc: Jim Mattson jmattson@google.com Cc: Liran Alon liran.alon@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx/vmx.c | 25 ------------------------- 1 file changed, 25 deletions(-)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6856,30 +6856,6 @@ static void nested_vmx_entry_exit_ctls_u } }
-static bool guest_cpuid_has_pmu(struct kvm_vcpu *vcpu) -{ - struct kvm_cpuid_entry2 *entry; - union cpuid10_eax eax; - - entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); - if (!entry) - return false; - - eax.full = entry->eax; - return (eax.split.version_id > 0); -} - -static void nested_vmx_procbased_ctls_update(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - bool pmu_enabled = guest_cpuid_has_pmu(vcpu); - - if (pmu_enabled) - vmx->nested.msrs.procbased_ctls_high |= CPU_BASED_RDPMC_EXITING; - else - vmx->nested.msrs.procbased_ctls_high &= ~CPU_BASED_RDPMC_EXITING; -} - static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -6968,7 +6944,6 @@ static void vmx_cpuid_update(struct kvm_ if (nested_vmx_allowed(vcpu)) { nested_vmx_cr_fixed1_bits_update(vcpu); nested_vmx_entry_exit_ctls_update(vcpu); - nested_vmx_procbased_ctls_update(vcpu); }
if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
From: Peter Xu peterx@redhat.com
commit 4ddc9204572c33f2eb91fbdb1d99d8078388b67d upstream.
kvm_dirty_bitmap_bytes() will return the size of the dirty bitmap of the memslot rather than the size of bitmap passed over from the ioctl. Here for KVM_CLEAR_DIRTY_LOG we should only copy exactly the size of bitmap that covers kvm_clear_dirty_log.num_pages.
Signed-off-by: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Fixes: 2a31b9db153530df4aa02dac8c32837bf5f47019 Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1250,7 +1250,7 @@ int kvm_clear_dirty_log_protect(struct k if (!dirty_bitmap) return -ENOENT;
- n = kvm_dirty_bitmap_bytes(memslot); + n = ALIGN(log->num_pages, BITS_PER_LONG) / 8;
if (log->first_page > memslot->npages || log->num_pages > memslot->npages - log->first_page ||
From: Sean Christopherson sean.j.christopherson@intel.com
commit 11988499e62b310f3bf6f6d0a807a06d3f9ccc96 upstream.
KVM allows userspace to violate consistency checks related to the guest's CPUID model to some degree. Generally speaking, userspace has carte blanche when it comes to guest state so long as jamming invalid state won't negatively affect the host.
Currently this is seems to be a non-issue as most of the interesting EFER checks are missing, e.g. NX and LME, but those will be added shortly. Proactively exempt userspace from the CPUID checks so as not to break userspace.
Note, the efer_reserved_bits check still applies to userspace writes as that mask reflects the host's capabilities, e.g. KVM shouldn't allow a guest to run with NX=1 if it has been disabled in the host.
Fixes: d80174745ba39 ("KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1262,31 +1262,42 @@ static int do_get_msr_feature(struct kvm return 0; }
-bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) +static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) { - if (efer & efer_reserved_bits) - return false; - if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT)) - return false; + return false;
if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM)) - return false; + return false;
return true; + +} +bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) +{ + if (efer & efer_reserved_bits) + return false; + + return __kvm_valid_efer(vcpu, efer); } EXPORT_SYMBOL_GPL(kvm_valid_efer);
-static int set_efer(struct kvm_vcpu *vcpu, u64 efer) +static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { u64 old_efer = vcpu->arch.efer; + u64 efer = msr_info->data;
- if (!kvm_valid_efer(vcpu, efer)) - return 1; + if (efer & efer_reserved_bits) + return false;
- if (is_paging(vcpu) - && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) - return 1; + if (!msr_info->host_initiated) { + if (!__kvm_valid_efer(vcpu, efer)) + return 1; + + if (is_paging(vcpu) && + (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) + return 1; + }
efer &= ~EFER_LMA; efer |= vcpu->arch.efer & EFER_LMA; @@ -2456,7 +2467,7 @@ int kvm_set_msr_common(struct kvm_vcpu * vcpu->arch.arch_capabilities = data; break; case MSR_EFER: - return set_efer(vcpu, data); + return set_efer(vcpu, msr_info); case MSR_K7_HWCR: data &= ~(u64)0x40; /* ignore flush filter disable */ data &= ~(u64)0x100; /* ignore ignne emulation enable */
From: Sean Christopherson sean.j.christopherson@intel.com
commit ee66e453db13d4837a0dcf9d43efa7a88603161b upstream.
...now that VMX's preemption timer, i.e. the hv_timer, also adjusts its programmed time based on lapic_timer_advance_ns. Without the delay, a guest can see a timer interrupt arrive before the requested time when KVM is using the hv_timer to emulate the guest's interrupt.
Fixes: c5ce8235cffa0 ("KVM: VMX: Optimize tscdeadline timer latency") Cc: stable@vger.kernel.org Cc: Wanpeng Li wanpengli@tencent.com Reviewed-by: Liran Alon liran.alon@oracle.com Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/lapic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1454,7 +1454,7 @@ static void apic_timer_expired(struct kv if (swait_active(q)) swake_up_one(q);
- if (apic_lvtt_tscdeadline(apic)) + if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use) ktimer->expired_tscdeadline = ktimer->tscdeadline; }
From: Steve French stfrench@microsoft.com
commit b63a9de02d64ecd5ff0749e90253f5b30ba5b9c0 upstream.
Displaying the session id in /proc/fs/cifs/DebugData is needed in order to correlate Linux client information with network and server traces for many common support scenarios. Turned out to be very important for debugging.
Signed-off-by: Steve French stfrench@microsoft.com CC: Stable stable@vger.kernel.org Reviewed-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/cifs_debug.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/cifs/cifs_debug.c +++ b/fs/cifs/cifs_debug.c @@ -376,6 +376,8 @@ skip_rdma: atomic_read(&server->in_send), atomic_read(&server->num_waiters)); #endif + /* dump session id helpful for use with network trace */ + seq_printf(m, " SessionId: 0x%llx", ses->Suid); if (ses->session_flags & SMB2_SESSION_FLAG_ENCRYPT_DATA) seq_puts(m, " encrypted"); if (ses->sign)
From: Masahiro Yamada yamada.masahiro@socionext.com
commit d2f8ae0e4c5c754f1b2a7b8388d19a1a977e698a upstream.
syncconfig is responsible for keeping auto.conf up-to-date, so if it fails for any reason, the build must be terminated immediately.
However, since commit 9390dff66a52 ("kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing"), Kbuild continues running even after syncconfig fails.
You can confirm this by intentionally making syncconfig error out:
# diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c # index 08ba146..307b9de 100644 # --- a/scripts/kconfig/confdata.c # +++ b/scripts/kconfig/confdata.c # @@ -1023,6 +1023,9 @@ int conf_write_autoconf(int overwrite) # FILE *out, *tristate, *out_h; # int i; # # + if (overwrite) # + return 1; # + # if (!overwrite && is_present(autoconf_name)) # return 0;
Then, syncconfig fails, but Make would not stop:
$ make -s mrproper allyesconfig defconfig $ make scripts/kconfig/conf --syncconfig Kconfig
*** Error during sync of the configuration.
make[2]: *** [scripts/kconfig/Makefile;69: syncconfig] Error 1 make[1]: *** [Makefile;557: syncconfig] Error 2 make: *** [include/config/auto.conf.cmd] Deleting file 'include/config/tristate.conf' make: Failed to remake makefile 'include/config/auto.conf'. SYSTBL arch/x86/include/generated/asm/syscalls_32.h SYSHDR arch/x86/include/generated/asm/unistd_32_ia32.h SYSHDR arch/x86/include/generated/asm/unistd_64_x32.h SYSTBL arch/x86/include/generated/asm/syscalls_64.h [ continue running ... ]
The reason is in the behavior of a pattern rule with multi-targets.
%/auto.conf %/auto.conf.cmd %/tristate.conf: $(KCONFIG_CONFIG) $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
GNU Make knows this rule is responsible for making all the three files simultaneously. As far as examined, auto.conf.cmd is the target in question when this rule is invoked. It is probably because auto.conf.cmd is included below the inclusion of auto.conf.
The inclusion of auto.conf is mandatory, while that of auto.conf.cmd is optional. GNU Make does not care about the failure in the process of updating optional include files.
I filed this issue (https://savannah.gnu.org/bugs/?56301) in case this behavior could be improved somehow in future releases of GNU Make. Anyway, it is quite easy to fix our Makefile.
Given that auto.conf is already a mandatory include file, there is no reason to stick auto.conf.cmd optional. Make it mandatory as well.
Cc: linux-stable stable@vger.kernel.org # 5.0+ Fixes: 9390dff66a52 ("kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing") Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com [commented out diff above to keep patch happy - gregkh] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Makefile +++ b/Makefile @@ -636,7 +636,7 @@ ifeq ($(may-sync-config),1) # Read in dependencies to all Kconfig* files, make sure to run syncconfig if # changes are detected. This should be included after arch/$(SRCARCH)/Makefile # because some architectures define CROSS_COMPILE there. --include include/config/auto.conf.cmd +include include/config/auto.conf.cmd
$(KCONFIG_CONFIG): @echo >&2 '***'
From: Roger Pau Monne roger.pau@citrix.com
commit c9f804d64bb93c8dbf957df1d7e9de11380e522d upstream.
Or else xen_domain() returns false despite xen_pvh being set.
Signed-off-by: Roger Pau Monné roger.pau@citrix.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/xen/enlighten_pvh.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -27,6 +27,7 @@ void __init xen_pvh_init(void) u64 pfn;
xen_pvh = 1; + xen_domain_type = XEN_HVM_DOMAIN; xen_start_flags = pvh_start_info.flags;
msr = cpuid_ebx(xen_cpuid_base() + 2);
From: Roger Pau Monne roger.pau@citrix.com
commit 72813bfbf0276a97c82af038efb5f02dcdd9e310 upstream.
This involves initializing the boot params EFI related fields and the efi global variable.
Without this fix a PVH dom0 doesn't detect when booted from EFI, and thus doesn't support accessing any of the EFI related data.
Reported-by: PGNet Dev pgnet.dev@gmail.com Signed-off-by: Roger Pau Monné roger.pau@citrix.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/platform/pvh/enlighten.c | 8 ++++---- arch/x86/xen/efi.c | 12 ++++++------ arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/enlighten_pvh.c | 6 +++++- arch/x86/xen/xen-ops.h | 4 ++-- 5 files changed, 18 insertions(+), 14 deletions(-)
--- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -44,8 +44,6 @@ void __init __weak mem_map_via_hcall(str
static void __init init_pvh_bootparams(bool xen_guest) { - memset(&pvh_bootparams, 0, sizeof(pvh_bootparams)); - if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { struct hvm_memmap_table_entry *ep; int i; @@ -103,7 +101,7 @@ static void __init init_pvh_bootparams(b * If we are trying to boot a Xen PVH guest, it is expected that the kernel * will have been configured to provide the required override for this routine. */ -void __init __weak xen_pvh_init(void) +void __init __weak xen_pvh_init(struct boot_params *boot_params) { xen_raw_printk("Error: Missing xen PVH initialization\n"); BUG(); @@ -112,7 +110,7 @@ void __init __weak xen_pvh_init(void) static void hypervisor_specific_init(bool xen_guest) { if (xen_guest) - xen_pvh_init(); + xen_pvh_init(&pvh_bootparams); }
/* @@ -131,6 +129,8 @@ void __init xen_prepare_pvh(void) BUG(); }
+ memset(&pvh_bootparams, 0, sizeof(pvh_bootparams)); + hypervisor_specific_init(xen_guest);
init_pvh_bootparams(xen_guest); --- a/arch/x86/xen/efi.c +++ b/arch/x86/xen/efi.c @@ -158,7 +158,7 @@ static enum efi_secureboot_mode xen_efi_ return efi_secureboot_mode_unknown; }
-void __init xen_efi_init(void) +void __init xen_efi_init(struct boot_params *boot_params) { efi_system_table_t *efi_systab_xen;
@@ -167,12 +167,12 @@ void __init xen_efi_init(void) if (efi_systab_xen == NULL) return;
- strncpy((char *)&boot_params.efi_info.efi_loader_signature, "Xen", - sizeof(boot_params.efi_info.efi_loader_signature)); - boot_params.efi_info.efi_systab = (__u32)__pa(efi_systab_xen); - boot_params.efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 32); + strncpy((char *)&boot_params->efi_info.efi_loader_signature, "Xen", + sizeof(boot_params->efi_info.efi_loader_signature)); + boot_params->efi_info.efi_systab = (__u32)__pa(efi_systab_xen); + boot_params->efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 32);
- boot_params.secure_boot = xen_efi_get_secureboot(); + boot_params->secure_boot = xen_efi_get_secureboot();
set_bit(EFI_BOOT, &efi.flags); set_bit(EFI_PARAVIRT, &efi.flags); --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1403,7 +1403,7 @@ asmlinkage __visible void __init xen_sta /* We need this for printk timestamps */ xen_setup_runstate_info(0);
- xen_efi_init(); + xen_efi_init(&boot_params);
/* Start the world */ #ifdef CONFIG_X86_32 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -13,6 +13,8 @@
#include <xen/interface/memory.h>
+#include "xen-ops.h" + /* * PVH variables. * @@ -21,7 +23,7 @@ */ bool xen_pvh __attribute__((section(".data"))) = 0;
-void __init xen_pvh_init(void) +void __init xen_pvh_init(struct boot_params *boot_params) { u32 msr; u64 pfn; @@ -33,6 +35,8 @@ void __init xen_pvh_init(void) msr = cpuid_ebx(xen_cpuid_base() + 2); pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + + xen_efi_init(boot_params); }
void __init mem_map_via_hcall(struct boot_params *boot_params_p) --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -122,9 +122,9 @@ static inline void __init xen_init_vga(c void __init xen_init_apic(void);
#ifdef CONFIG_XEN_EFI -extern void xen_efi_init(void); +extern void xen_efi_init(struct boot_params *boot_params); #else -static inline void __init xen_efi_init(void) +static inline void __init xen_efi_init(struct boot_params *boot_params) { } #endif
From: Christophe Leroy christophe.leroy@c-s.fr
commit 397d2300b08cdee052053e362018cdb6dd65eea2 upstream.
flush_hash_pages() runs with data translation off, so current task_struct has to be accesssed using physical address.
Fixes: f7354ccac844 ("powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU") Cc: stable@vger.kernel.org # v5.1+ Reported-by: Erhard F. erhard_f@mailbox.org Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/mm/hash_low_32.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/powerpc/mm/hash_low_32.S +++ b/arch/powerpc/mm/hash_low_32.S @@ -539,7 +539,8 @@ _GLOBAL(flush_hash_pages) #ifdef CONFIG_SMP lis r9, (mmu_hash_lock - PAGE_OFFSET)@ha addi r9, r9, (mmu_hash_lock - PAGE_OFFSET)@l - lwz r8,TASK_CPU(r2) + tophys (r8, r2) + lwz r8, TASK_CPU(r8) oris r8,r8,9 10: lwarx r0,0,r9 cmpi 0,r0,0
From: Dan Williams dan.j.williams@intel.com
commit c4703ce11c23423d4b46e3d59aef7979814fd608 upstream.
Users have reported intermittent occurrences of DIMM initialization failures due to duplicate allocations of address capacity detected in the labels, or errors of the form below, both have the same root cause.
nd namespace1.4: failed to track label: 0 WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863
RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm] Call Trace: ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] uuid_store+0x17e/0x190 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x57/0xd0 do_syscall_64+0x60/0x210
Unfortunately those reports were typically with a busy parallel namespace creation / destruction loop making it difficult to see the components of the bug. However, Jane provided a simple reproducer using the work-in-progress sub-section implementation.
When ndctl is reconfiguring a namespace it may take an existing defunct / disabled namespace and reconfigure it with a new uuid and other parameters. Critically namespace_update_uuid() takes existing address resources and renames them for the new namespace to use / reconfigure as it sees fit. The bug is that this rename only happens in the resource tracking tree. Existing labels with the old uuid are not reaped leading to a scenario where multiple active labels reference the same span of address range.
Teach namespace_update_uuid() to flag any references to the old uuid for reaping at the next label update attempt.
Cc: stable@vger.kernel.org Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation") Link: https://github.com/pmem/ndctl/issues/91 Reported-by: Jane Chu jane.chu@oracle.com Reported-by: Jeff Moyer jmoyer@redhat.com Reported-by: Erwin Tsaur erwin.tsaur@oracle.com Cc: Johannes Thumshirn jthumshirn@suse.de Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/nvdimm/label.c | 29 ++++++++++++++++------------- drivers/nvdimm/namespace_devs.c | 15 +++++++++++++++ drivers/nvdimm/nd.h | 4 ++++ 3 files changed, 35 insertions(+), 13 deletions(-)
--- a/drivers/nvdimm/label.c +++ b/drivers/nvdimm/label.c @@ -756,6 +756,17 @@ static const guid_t *to_abstraction_guid return &guid_null; }
+static void reap_victim(struct nd_mapping *nd_mapping, + struct nd_label_ent *victim) +{ + struct nvdimm_drvdata *ndd = to_ndd(nd_mapping); + u32 slot = to_slot(ndd, victim->label); + + dev_dbg(ndd->dev, "free: %d\n", slot); + nd_label_free_slot(ndd, slot); + victim->label = NULL; +} + static int __pmem_label_update(struct nd_region *nd_region, struct nd_mapping *nd_mapping, struct nd_namespace_pmem *nspm, int pos, unsigned long flags) @@ -763,9 +774,9 @@ static int __pmem_label_update(struct nd struct nd_namespace_common *ndns = &nspm->nsio.common; struct nd_interleave_set *nd_set = nd_region->nd_set; struct nvdimm_drvdata *ndd = to_ndd(nd_mapping); - struct nd_label_ent *label_ent, *victim = NULL; struct nd_namespace_label *nd_label; struct nd_namespace_index *nsindex; + struct nd_label_ent *label_ent; struct nd_label_id label_id; struct resource *res; unsigned long *free; @@ -834,18 +845,10 @@ static int __pmem_label_update(struct nd list_for_each_entry(label_ent, &nd_mapping->labels, list) { if (!label_ent->label) continue; - if (memcmp(nspm->uuid, label_ent->label->uuid, - NSLABEL_UUID_LEN) != 0) - continue; - victim = label_ent; - list_move_tail(&victim->list, &nd_mapping->labels); - break; - } - if (victim) { - dev_dbg(ndd->dev, "free: %d\n", slot); - slot = to_slot(ndd, victim->label); - nd_label_free_slot(ndd, slot); - victim->label = NULL; + if (test_and_clear_bit(ND_LABEL_REAP, &label_ent->flags) + || memcmp(nspm->uuid, label_ent->label->uuid, + NSLABEL_UUID_LEN) == 0) + reap_victim(nd_mapping, label_ent); }
/* update index */ --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1247,12 +1247,27 @@ static int namespace_update_uuid(struct for (i = 0; i < nd_region->ndr_mappings; i++) { struct nd_mapping *nd_mapping = &nd_region->mapping[i]; struct nvdimm_drvdata *ndd = to_ndd(nd_mapping); + struct nd_label_ent *label_ent; struct resource *res;
for_each_dpa_resource(ndd, res) if (strcmp(res->name, old_label_id.id) == 0) sprintf((void *) res->name, "%s", new_label_id.id); + + mutex_lock(&nd_mapping->lock); + list_for_each_entry(label_ent, &nd_mapping->labels, list) { + struct nd_namespace_label *nd_label = label_ent->label; + struct nd_label_id label_id; + + if (!nd_label) + continue; + nd_label_gen_id(&label_id, nd_label->uuid, + __le32_to_cpu(nd_label->flags)); + if (strcmp(old_label_id.id, label_id.id) == 0) + set_bit(ND_LABEL_REAP, &label_ent->flags); + } + mutex_unlock(&nd_mapping->lock); } kfree(*old_uuid); out: --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -113,8 +113,12 @@ struct nd_percpu_lane { spinlock_t lock; };
+enum nd_label_flags { + ND_LABEL_REAP, +}; struct nd_label_ent { struct list_head list; + unsigned long flags; struct nd_namespace_label *label; };
From: Martin Schwidefsky schwidefsky@de.ibm.com
commit d1874a0c2805fcfa9162c972d6b7541e57adb542 upstream.
Change the way how pgd_offset, p4d_offset, pud_offset and pmd_offset walk the page tables. pgd_offset now always calculates the index for the top-level page table and adds it to the pgd, this is either a segment table offset for a 2-level setup, a region-3 offset for 3-levels, region-2 offset for 4-levels, or a region-1 offset for a 5-level setup. The other three functions p4d_offset, pud_offset and pmd_offset will only add the respective offset if they dereference the passed pointer.
With the new way of walking the page tables a sequence like this from mm/gup.c now works:
pgdp = pgd_offset(current->mm, addr); pgd = READ_ONCE(*pgdp); p4dp = p4d_offset(&pgd, addr); p4d = READ_ONCE(*p4dp); pudp = pud_offset(&p4d, addr); pud = READ_ONCE(*pudp); pmdp = pmd_offset(&pud, addr); pmd = READ_ONCE(*pmdp);
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/include/asm/pgtable.h | 67 +++++++++++++++++++++++++--------------- arch/s390/mm/gup.c | 33 +++++++------------ 2 files changed, 55 insertions(+), 45 deletions(-)
--- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1204,42 +1204,67 @@ static inline pte_t mk_pte(struct page * #define pmd_index(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1)) #define pte_index(address) (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE-1))
-#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address)) -#define pgd_offset_k(address) pgd_offset(&init_mm, address) -#define pgd_offset_raw(pgd, addr) ((pgd) + pgd_index(addr)) - #define pmd_deref(pmd) (pmd_val(pmd) & _SEGMENT_ENTRY_ORIGIN) #define pud_deref(pud) (pud_val(pud) & _REGION_ENTRY_ORIGIN) #define p4d_deref(pud) (p4d_val(pud) & _REGION_ENTRY_ORIGIN) #define pgd_deref(pgd) (pgd_val(pgd) & _REGION_ENTRY_ORIGIN)
-static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address) +/* + * The pgd_offset function *always* adds the index for the top-level + * region/segment table. This is done to get a sequence like the + * following to work: + * pgdp = pgd_offset(current->mm, addr); + * pgd = READ_ONCE(*pgdp); + * p4dp = p4d_offset(&pgd, addr); + * ... + * The subsequent p4d_offset, pud_offset and pmd_offset functions + * only add an index if they dereferenced the pointer. + */ +static inline pgd_t *pgd_offset_raw(pgd_t *pgd, unsigned long address) { - p4d_t *p4d = (p4d_t *) pgd; + unsigned long rste; + unsigned int shift;
- if ((pgd_val(*pgd) & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R1) - p4d = (p4d_t *) pgd_deref(*pgd); - return p4d + p4d_index(address); + /* Get the first entry of the top level table */ + rste = pgd_val(*pgd); + /* Pick up the shift from the table type of the first entry */ + shift = ((rste & _REGION_ENTRY_TYPE_MASK) >> 2) * 11 + 20; + return pgd + ((address >> shift) & (PTRS_PER_PGD - 1)); }
-static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address) +#define pgd_offset(mm, address) pgd_offset_raw(READ_ONCE((mm)->pgd), address) +#define pgd_offset_k(address) pgd_offset(&init_mm, address) + +static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address) { - pud_t *pud = (pud_t *) p4d; + if ((pgd_val(*pgd) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R1) + return (p4d_t *) pgd_deref(*pgd) + p4d_index(address); + return (p4d_t *) pgd; +}
- if ((p4d_val(*p4d) & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R2) - pud = (pud_t *) p4d_deref(*p4d); - return pud + pud_index(address); +static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address) +{ + if ((p4d_val(*p4d) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R2) + return (pud_t *) p4d_deref(*p4d) + pud_index(address); + return (pud_t *) p4d; }
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { - pmd_t *pmd = (pmd_t *) pud; + if ((pud_val(*pud) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R3) + return (pmd_t *) pud_deref(*pud) + pmd_index(address); + return (pmd_t *) pud; +}
- if ((pud_val(*pud) & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R3) - pmd = (pmd_t *) pud_deref(*pud); - return pmd + pmd_index(address); +static inline pte_t *pte_offset(pmd_t *pmd, unsigned long address) +{ + return (pte_t *) pmd_deref(*pmd) + pte_index(address); }
+#define pte_offset_kernel(pmd, address) pte_offset(pmd, address) +#define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address) +#define pte_unmap(pte) do { } while (0) + #define pfn_pte(pfn,pgprot) mk_pte_phys(__pa((pfn) << PAGE_SHIFT),(pgprot)) #define pte_pfn(x) (pte_val(x) >> PAGE_SHIFT) #define pte_page(x) pfn_to_page(pte_pfn(x)) @@ -1249,12 +1274,6 @@ static inline pmd_t *pmd_offset(pud_t *p #define p4d_page(p4d) pfn_to_page(p4d_pfn(p4d)) #define pgd_page(pgd) pfn_to_page(pgd_pfn(pgd))
-/* Find an entry in the lowest level page table.. */ -#define pte_offset(pmd, addr) ((pte_t *) pmd_deref(*(pmd)) + pte_index(addr)) -#define pte_offset_kernel(pmd, address) pte_offset(pmd,address) -#define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address) -#define pte_unmap(pte) do { } while (0) - static inline pmd_t pmd_wrprotect(pmd_t pmd) { pmd_val(pmd) &= ~_SEGMENT_ENTRY_WRITE; --- a/arch/s390/mm/gup.c +++ b/arch/s390/mm/gup.c @@ -18,7 +18,7 @@ * inlines everything into a single function which results in too much * register pressure. */ -static inline int gup_pte_range(pmd_t *pmdp, pmd_t pmd, unsigned long addr, +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, int write, struct page **pages, int *nr) { struct page *head, *page; @@ -27,7 +27,7 @@ static inline int gup_pte_range(pmd_t *p
mask = (write ? _PAGE_PROTECT : 0) | _PAGE_INVALID | _PAGE_SPECIAL;
- ptep = ((pte_t *) pmd_deref(pmd)) + pte_index(addr); + ptep = pte_offset_map(&pmd, addr); do { pte = *ptep; barrier(); @@ -93,16 +93,13 @@ static inline int gup_huge_pmd(pmd_t *pm }
-static inline int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, +static inline int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, int write, struct page **pages, int *nr) { unsigned long next; pmd_t *pmdp, pmd;
- pmdp = (pmd_t *) pudp; - if ((pud_val(pud) & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R3) - pmdp = (pmd_t *) pud_deref(pud); - pmdp += pmd_index(addr); + pmdp = pmd_offset(&pud, addr); do { pmd = *pmdp; barrier(); @@ -120,7 +117,7 @@ static inline int gup_pmd_range(pud_t *p if (!gup_huge_pmd(pmdp, pmd, addr, next, write, pages, nr)) return 0; - } else if (!gup_pte_range(pmdp, pmd, addr, next, + } else if (!gup_pte_range(pmd, addr, next, write, pages, nr)) return 0; } while (pmdp++, addr = next, addr != end); @@ -166,16 +163,13 @@ static int gup_huge_pud(pud_t *pudp, pud return 1; }
-static inline int gup_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, +static inline int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, int write, struct page **pages, int *nr) { unsigned long next; pud_t *pudp, pud;
- pudp = (pud_t *) p4dp; - if ((p4d_val(p4d) & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R2) - pudp = (pud_t *) p4d_deref(p4d); - pudp += pud_index(addr); + pudp = pud_offset(&p4d, addr); do { pud = *pudp; barrier(); @@ -186,7 +180,7 @@ static inline int gup_pud_range(p4d_t *p if (!gup_huge_pud(pudp, pud, addr, next, write, pages, nr)) return 0; - } else if (!gup_pmd_range(pudp, pud, addr, next, write, pages, + } else if (!gup_pmd_range(pud, addr, next, write, pages, nr)) return 0; } while (pudp++, addr = next, addr != end); @@ -194,23 +188,20 @@ static inline int gup_pud_range(p4d_t *p return 1; }
-static inline int gup_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, +static inline int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end, int write, struct page **pages, int *nr) { unsigned long next; p4d_t *p4dp, p4d;
- p4dp = (p4d_t *) pgdp; - if ((pgd_val(pgd) & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R1) - p4dp = (p4d_t *) pgd_deref(pgd); - p4dp += p4d_index(addr); + p4dp = p4d_offset(&pgd, addr); do { p4d = *p4dp; barrier(); next = p4d_addr_end(addr, end); if (p4d_none(p4d)) return 0; - if (!gup_pud_range(p4dp, p4d, addr, next, write, pages, nr)) + if (!gup_pud_range(p4d, addr, next, write, pages, nr)) return 0; } while (p4dp++, addr = next, addr != end);
@@ -253,7 +244,7 @@ int __get_user_pages_fast(unsigned long next = pgd_addr_end(addr, end); if (pgd_none(pgd)) break; - if (!gup_p4d_range(pgdp, pgd, addr, next, write, pages, &nr)) + if (!gup_p4d_range(pgd, addr, next, write, pages, &nr)) break; } while (pgdp++, addr = next, addr != end); local_irq_restore(flags);
From: Martin Schwidefsky schwidefsky@de.ibm.com
commit 1a42010cdc26bb7e5912984f3c91b8c6d55f089a upstream.
Define the gup_fast_permitted to check against the asce_limit of the mm attached to the current task, then replace the s390 specific gup code with the generic implementation in mm/gup.c.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/Kconfig | 1 arch/s390/include/asm/pgtable.h | 12 + arch/s390/mm/Makefile | 2 arch/s390/mm/gup.c | 291 ---------------------------------------- 4 files changed, 14 insertions(+), 292 deletions(-)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -149,6 +149,7 @@ config S390 select HAVE_FUNCTION_TRACER select HAVE_FUTEX_CMPXCHG if FUTEX select HAVE_GCC_PLUGINS + select HAVE_GENERIC_GUP select HAVE_KERNEL_BZIP2 select HAVE_KERNEL_GZIP select HAVE_KERNEL_LZ4 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1265,6 +1265,18 @@ static inline pte_t *pte_offset(pmd_t *p #define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address) #define pte_unmap(pte) do { } while (0)
+static inline bool gup_fast_permitted(unsigned long start, int nr_pages) +{ + unsigned long len, end; + + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + if (end < start) + return false; + return end <= current->mm->context.asce_limit; +} +#define gup_fast_permitted gup_fast_permitted + #define pfn_pte(pfn,pgprot) mk_pte_phys(__pa((pfn) << PAGE_SHIFT),(pgprot)) #define pte_pfn(x) (pte_val(x) >> PAGE_SHIFT) #define pte_page(x) pfn_to_page(pte_pfn(x)) --- a/arch/s390/mm/Makefile +++ b/arch/s390/mm/Makefile @@ -4,7 +4,7 @@ #
obj-y := init.o fault.o extmem.o mmap.o vmem.o maccess.o -obj-y += page-states.o gup.o pageattr.o pgtable.o pgalloc.o +obj-y += page-states.o pageattr.o pgtable.o pgalloc.o
obj-$(CONFIG_CMM) += cmm.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o --- a/arch/s390/mm/gup.c +++ /dev/null @@ -1,291 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Lockless get_user_pages_fast for s390 - * - * Copyright IBM Corp. 2010 - * Author(s): Martin Schwidefsky schwidefsky@de.ibm.com - */ -#include <linux/sched.h> -#include <linux/mm.h> -#include <linux/hugetlb.h> -#include <linux/vmstat.h> -#include <linux/pagemap.h> -#include <linux/rwsem.h> -#include <asm/pgtable.h> - -/* - * The performance critical leaf functions are made noinline otherwise gcc - * inlines everything into a single function which results in too much - * register pressure. - */ -static inline int gup_pte_range(pmd_t pmd, unsigned long addr, - unsigned long end, int write, struct page **pages, int *nr) -{ - struct page *head, *page; - unsigned long mask; - pte_t *ptep, pte; - - mask = (write ? _PAGE_PROTECT : 0) | _PAGE_INVALID | _PAGE_SPECIAL; - - ptep = pte_offset_map(&pmd, addr); - do { - pte = *ptep; - barrier(); - /* Similar to the PMD case, NUMA hinting must take slow path */ - if (pte_protnone(pte)) - return 0; - if ((pte_val(pte) & mask) != 0) - return 0; - VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - page = pte_page(pte); - head = compound_head(page); - if (!page_cache_get_speculative(head)) - return 0; - if (unlikely(pte_val(pte) != pte_val(*ptep))) { - put_page(head); - return 0; - } - VM_BUG_ON_PAGE(compound_head(page) != head, page); - pages[*nr] = page; - (*nr)++; - - } while (ptep++, addr += PAGE_SIZE, addr != end); - - return 1; -} - -static inline int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr, - unsigned long end, int write, struct page **pages, int *nr) -{ - struct page *head, *page; - unsigned long mask; - int refs; - - mask = (write ? _SEGMENT_ENTRY_PROTECT : 0) | _SEGMENT_ENTRY_INVALID; - if ((pmd_val(pmd) & mask) != 0) - return 0; - VM_BUG_ON(!pfn_valid(pmd_val(pmd) >> PAGE_SHIFT)); - - refs = 0; - head = pmd_page(pmd); - page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - do { - VM_BUG_ON(compound_head(page) != head); - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); - - if (!page_cache_add_speculative(head, refs)) { - *nr -= refs; - return 0; - } - - if (unlikely(pmd_val(pmd) != pmd_val(*pmdp))) { - *nr -= refs; - while (refs--) - put_page(head); - return 0; - } - - return 1; -} - - -static inline int gup_pmd_range(pud_t pud, unsigned long addr, - unsigned long end, int write, struct page **pages, int *nr) -{ - unsigned long next; - pmd_t *pmdp, pmd; - - pmdp = pmd_offset(&pud, addr); - do { - pmd = *pmdp; - barrier(); - next = pmd_addr_end(addr, end); - if (pmd_none(pmd)) - return 0; - if (unlikely(pmd_large(pmd))) { - /* - * NUMA hinting faults need to be handled in the GUP - * slowpath for accounting purposes and so that they - * can be serialised against THP migration. - */ - if (pmd_protnone(pmd)) - return 0; - if (!gup_huge_pmd(pmdp, pmd, addr, next, - write, pages, nr)) - return 0; - } else if (!gup_pte_range(pmd, addr, next, - write, pages, nr)) - return 0; - } while (pmdp++, addr = next, addr != end); - - return 1; -} - -static int gup_huge_pud(pud_t *pudp, pud_t pud, unsigned long addr, - unsigned long end, int write, struct page **pages, int *nr) -{ - struct page *head, *page; - unsigned long mask; - int refs; - - mask = (write ? _REGION_ENTRY_PROTECT : 0) | _REGION_ENTRY_INVALID; - if ((pud_val(pud) & mask) != 0) - return 0; - VM_BUG_ON(!pfn_valid(pud_pfn(pud))); - - refs = 0; - head = pud_page(pud); - page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - do { - VM_BUG_ON_PAGE(compound_head(page) != head, page); - pages[*nr] = page; - (*nr)++; - page++; - refs++; - } while (addr += PAGE_SIZE, addr != end); - - if (!page_cache_add_speculative(head, refs)) { - *nr -= refs; - return 0; - } - - if (unlikely(pud_val(pud) != pud_val(*pudp))) { - *nr -= refs; - while (refs--) - put_page(head); - return 0; - } - - return 1; -} - -static inline int gup_pud_range(p4d_t p4d, unsigned long addr, - unsigned long end, int write, struct page **pages, int *nr) -{ - unsigned long next; - pud_t *pudp, pud; - - pudp = pud_offset(&p4d, addr); - do { - pud = *pudp; - barrier(); - next = pud_addr_end(addr, end); - if (pud_none(pud)) - return 0; - if (unlikely(pud_large(pud))) { - if (!gup_huge_pud(pudp, pud, addr, next, write, pages, - nr)) - return 0; - } else if (!gup_pmd_range(pud, addr, next, write, pages, - nr)) - return 0; - } while (pudp++, addr = next, addr != end); - - return 1; -} - -static inline int gup_p4d_range(pgd_t pgd, unsigned long addr, - unsigned long end, int write, struct page **pages, int *nr) -{ - unsigned long next; - p4d_t *p4dp, p4d; - - p4dp = p4d_offset(&pgd, addr); - do { - p4d = *p4dp; - barrier(); - next = p4d_addr_end(addr, end); - if (p4d_none(p4d)) - return 0; - if (!gup_pud_range(p4d, addr, next, write, pages, nr)) - return 0; - } while (p4dp++, addr = next, addr != end); - - return 1; -} - -/* - * Like get_user_pages_fast() except its IRQ-safe in that it won't fall - * back to the regular GUP. - * Note a difference with get_user_pages_fast: this always returns the - * number of pages pinned, 0 if no pages were pinned. - */ -int __get_user_pages_fast(unsigned long start, int nr_pages, int write, - struct page **pages) -{ - struct mm_struct *mm = current->mm; - unsigned long addr, len, end; - unsigned long next, flags; - pgd_t *pgdp, pgd; - int nr = 0; - - start &= PAGE_MASK; - addr = start; - len = (unsigned long) nr_pages << PAGE_SHIFT; - end = start + len; - if ((end <= start) || (end > mm->context.asce_limit)) - return 0; - /* - * local_irq_save() doesn't prevent pagetable teardown, but does - * prevent the pagetables from being freed on s390. - * - * So long as we atomically load page table pointers versus teardown, - * we can follow the address down to the the page and take a ref on it. - */ - local_irq_save(flags); - pgdp = pgd_offset(mm, addr); - do { - pgd = *pgdp; - barrier(); - next = pgd_addr_end(addr, end); - if (pgd_none(pgd)) - break; - if (!gup_p4d_range(pgd, addr, next, write, pages, &nr)) - break; - } while (pgdp++, addr = next, addr != end); - local_irq_restore(flags); - - return nr; -} - -/** - * get_user_pages_fast() - pin user pages in memory - * @start: starting user address - * @nr_pages: number of pages from start to pin - * @write: whether pages will be written to - * @pages: array that receives pointers to the pages pinned. - * Should be at least nr_pages long. - * - * Attempt to pin user pages in memory without taking mm->mmap_sem. - * If not successful, it will fall back to taking the lock and - * calling get_user_pages(). - * - * Returns number of pages pinned. This may be fewer than the number - * requested. If nr_pages is 0 or negative, returns 0. If no pages - * were pinned, returns -errno. - */ -int get_user_pages_fast(unsigned long start, int nr_pages, int write, - struct page **pages) -{ - int nr, ret; - - might_sleep(); - start &= PAGE_MASK; - nr = __get_user_pages_fast(start, nr_pages, write, pages); - if (nr == nr_pages) - return nr; - - /* Try to get the remaining pages with get_user_pages */ - start += nr << PAGE_SHIFT; - pages += nr; - ret = get_user_pages_unlocked(start, nr_pages - nr, pages, - write ? FOLL_WRITE : 0); - /* Have to be a bit careful with return values */ - if (nr > 0) - ret = (ret < 0) ? nr : ret + nr; - return ret; -}
From: Colin Ian King colin.king@canonical.com
commit fbbbbd2f28aec991f3fbc248df211550fbdfd58c upstream.
There are two cases where u32 variables n and err are being checked for less than zero error values, the checks is always false because the variables are not signed. Fix this by making the variables ints.
Addresses-Coverity: ("Unsigned compared against 0") Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") Signed-off-by: Colin Ian King colin.king@canonical.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/block_validity.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -142,7 +142,8 @@ static int ext4_protect_reserved_inode(s struct inode *inode; struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_map_blocks map; - u32 i = 0, err = 0, num, n; + u32 i = 0, num; + int err = 0, n;
if ((ino < EXT4_ROOT_INO) || (ino > le32_to_cpu(sbi->s_es->s_inodes_count)))
From: Theodore Ts'o tytso@mit.edu
commit 170417c8c7bb2cbbdd949bf5c443c0c8f24a203b upstream.
Commit 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") failed to add an exception for the journal inode in ext4_check_blockref(), which is the function used by ext4_get_branch() for indirect blocks. This caused attempts to read from the ext3-style journals to fail with:
[ 848.968550] EXT4-fs error (device sdb7): ext4_get_branch:171: inode #8: block 30343695: comm jbd2/sdb7-8: invalid block
Fix this by adding the missing exception check.
Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") Reported-by: Arthur Marsh arthur.marsh@internode.on.net Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/block_validity.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -276,6 +276,11 @@ int ext4_check_blockref(const char *func __le32 *bref = p; unsigned int blk;
+ if (ext4_has_feature_journal(inode->i_sb) && + (inode->i_ino == + le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum))) + return 0; + while (bref < p+max) { blk = le32_to_cpu(*bref++); if (blk &&
stable-rc/linux-5.1.y boot: 128 boots: 1 failed, 125 passed with 1 offline, 1 conflict (v5.1.3-129-gcce3bc9ebd2f)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-5.1.y/kernel/v5.1.3... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-5.1.y/kernel/v5.1.3-129-gc...
Tree: stable-rc Branch: linux-5.1.y Git Describe: v5.1.3-129-gcce3bc9ebd2f Git Commit: cce3bc9ebd2fd2d9cfc0bfe53abbf3525a7fab45 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 72 unique boards, 24 SoC families, 14 builds out of 209
Boot Regressions Detected:
arm:
omap2plus_defconfig: gcc-8: omap4-panda: lab-baylibre: new failure (last pass: v5.1.3)
Boot Failure Detected:
arm: multi_v7_defconfig: gcc-8: bcm4708-smartrg-sr400ac: 1 failed lab
Offline Platforms:
arm:
multi_v7_defconfig: gcc-8 stih410-b2120: 1 offline lab
Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)
arm: omap2plus_defconfig: omap4-panda: lab-baylibre: FAIL (gcc-8) lab-baylibre-seattle: PASS (gcc-8)
--- For more info write to info@kernelci.org
On 20/05/2019 13:13, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.1.4 release. There are 128 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:41 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.1: 12 builds: 12 pass, 0 fail 22 boots: 22 pass, 0 fail 32 tests: 32 pass, 0 fail
Linux version: 5.1.4-rc1-gcce3bc9 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Tue, May 21, 2019 at 09:52:09AM +0100, Jon Hunter wrote:
On 20/05/2019 13:13, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.1.4 release. There are 128 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:41 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.1: 12 builds: 12 pass, 0 fail 22 boots: 22 pass, 0 fail 32 tests: 32 pass, 0 fail
Linux version: 5.1.4-rc1-gcce3bc9 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Many thanks for testing all of these and letting me know.
greg k-h
On 5/20/19 6:13 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.1.4 release. There are 128 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:41 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on the test system. No dmesg regressions.
thanks, -- Shuah
On Tue, May 21, 2019 at 03:11:15PM -0600, shuah wrote:
On 5/20/19 6:13 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.1.4 release. There are 128 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:41 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on the test system. No dmesg regressions.
Thanks for testing all of these and letting me know.
greg k-h
On Mon, 20 May 2019 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.1.4 release. There are 128 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:41 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y and the diffstat can be found below.
thanks,
greg k-h
5.1.4-rc2 report,
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 5.1.4-rc2 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.1.y git commit: a8112defa801e2b32d9da822880f32966d30158c git describe: v5.1.3-126-ga8112defa801 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.1-oe/build/v5.1.3-126-g...
No regressions (compared to build v5.1.3)
No fixes (compared to build v5.1.3)
Ran 20655 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - i386 - juno-r2 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libgpiod * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-fs-tests * ltp-open-posix-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
-- Linaro LKFT https://lkft.linaro.org
On Wed, May 22, 2019 at 10:33:06AM +0530, Naresh Kamboju wrote:
On Mon, 20 May 2019 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.1.4 release. There are 128 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:41 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.4-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y and the diffstat can be found below.
thanks,
greg k-h
5.1.4-rc2 report,
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Wonderful, thanks for testing all of these again, and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org