This is the start of the stable review cycle for the 4.14.25 release. There are 110 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Mar 9 19:10:03 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.25-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.25-rc1
Sagi Grimberg sagi@grimberg.me nvme-rdma: don't suppress send completions
NeilBrown neilb@suse.com md: only allow remove_and_add_spares when no sync_thread running.
Adam Ford aford173@gmail.com ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
Adam Ford aford173@gmail.com ARM: dts: LogicPD SOM-LV: Fix I2C1 pinmux
Kai Heng Feng kai.heng.feng@canonical.com ACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530
Eric Biggers ebiggers@google.com KVM/x86: remove WARN_ON() for when vm_munmap() fails
Tianyu Lan lantianyu1986@gmail.com KVM/x86: Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in kvm_valid_sregs()
Ard Biesheuvel ard.biesheuvel@linaro.org PCI/ASPM: Deal with missing root ports in link state handling
Radim Krčmář rkrcmar@redhat.com KVM: x86: fix vcpu initialization with userspace lapic
Paolo Bonzini pbonzini@redhat.com KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
Paolo Bonzini pbonzini@redhat.com KVM: x86: move LAPIC initialization after VMCS creation
Paolo Bonzini pbonzini@redhat.com KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
Wanpeng Li wanpeng.li@hotmail.com KVM: mmu: Fix overlap between public and private memslots
Wanpeng Li wanpengli@tencent.com KVM: X86: Fix SMRAM accessing even if VM is shutdown
Paolo Bonzini pbonzini@redhat.com KVM: x86: extend usage of RET_MMIO_PF_* constants
Arnd Bergmann arnd@arndb.de ARM: kvm: fix building with gcc-8
Ulf Magnusson ulfalizer@gmail.com ARM: mvebu: Fix broken PL310_ERRATA_753970 selects
Daniel Schultz d.schultz@phytec.de ARM: dts: rockchip: Remove 1.8 GHz operation point from phycore som
Arnd Bergmann arnd@arndb.de ARM: orion: fix orion_ge00_switch_board_info initialization
Jan Beulich JBeulich@suse.com x86/mm: Fix {pmd,pud}_{set,clear}_flags()
Rasmus Villemoes linux@rasmusvillemoes.dk nospec: Allow index argument to have const-qualified type
David Hildenbrand david@redhat.com KVM: s390: consider epoch index on TOD clock syncs
David Hildenbrand david@redhat.com KVM: s390: consider epoch index on hotplugged CPUs
David Hildenbrand david@redhat.com KVM: s390: provide only a single function for setting the tod (fix SCK)
David Hildenbrand david@redhat.com KVM: s390: take care of clock-comparator sign control
Anna Karbownik anna.karbownik@intel.com EDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNL
Mauro Carvalho Chehab mchehab@kernel.org media: m88ds3103: don't call a non-initalized function
Ming Lei ming.lei@redhat.com blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch
Julian Wiedmann jwi@linux.vnet.ibm.com s390/qeth: fix IPA command submission race
Julian Wiedmann jwi@linux.vnet.ibm.com s390/qeth: fix IP address lookup for L3 devices
Julian Wiedmann jwi@linux.vnet.ibm.com Revert "s390/qeth: fix using of ref counter for rxip addresses"
Julian Wiedmann jwi@linux.vnet.ibm.com s390/qeth: fix double-free on IP add/remove race
Julian Wiedmann jwi@linux.vnet.ibm.com s390/qeth: fix IP removal on offline cards
Julian Wiedmann jwi@linux.vnet.ibm.com s390/qeth: fix overestimated count of buffer elements
Julian Wiedmann jwi@linux.vnet.ibm.com s390/qeth: fix SETIP command handling
Ursula Braun ubraun@linux.vnet.ibm.com s390/qeth: fix underestimated count of buffer elements
Jason Wang jasowang@redhat.com virtio-net: disable NAPI only when enabled during XDP set
Jason Wang jasowang@redhat.com tuntap: disable preemption during XDP processing
Jason Wang jasowang@redhat.com tuntap: correctly add the missing XDP flush
Soheil Hassas Yeganeh soheil@google.com tcp: purge write queue upon RST
Jason A. Donenfeld Jason@zx2c4.com netlink: put module reference if dump start fails
Ido Schimmel idosch@mellanox.com mlxsw: spectrum_router: Do not unconditionally clear route offload indication
Paolo Abeni pabeni@redhat.com cls_u32: fix use after free in u32_destroy_key()
Tom Lendacky thomas.lendacky@amd.com amd-xgbe: Restore PCI interrupt enablement setting on resume
Eran Ben Elisha eranbe@mellanox.com net/mlx5e: Verify inline header size do not exceed SKB linear size
Ido Schimmel idosch@mellanox.com bridge: Fix VLAN reference count problem
Alexey Kodanev alexey.kodanev@oracle.com sctp: fix dst refcnt leak in sctp_v6_get_dst()
David Ahern dsahern@gmail.com net: ipv4: Set addr_type in hash_keys for forwarded case
Jiri Pirko jiri@mellanox.com mlxsw: spectrum_router: Fix error path in mlxsw_sp_vr_create
Yuchung Cheng ycheng@google.com tcp: revert F-RTO extension to detect more spurious timeouts
Yuchung Cheng ycheng@google.com tcp: revert F-RTO middle-box workaround
Xin Long lucien.xin@gmail.com sctp: do not pr_err for the duplicated node in transport rhlist
Ivan Vecera ivecera@redhat.com net/sched: cls_u32: fix cls_u32 on filter replace
Eric Dumazet edumazet@google.com net_sched: gen_estimator: fix broken estimators based on percpu stats
Inbar Karmy inbark@mellanox.com net/mlx5e: Fix loopback self test when GRO is off
Tonghao Zhang xiangxia.m.yue@gmail.com doc: Change the min default value of tcp_wmem/tcp_rmem.
Eric Dumazet edumazet@google.com tcp_bbr: better deal with suboptimal GSO
David Howells dhowells@redhat.com rxrpc: Fix send in rxrpc_send_data_packet()
Ilya Lesokhin ilyal@mellanox.com tcp: Honor the eor bit in tcp_mtu_probe
Heiner Kallweit hkallweit1@gmail.com net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT
Gal Pressman galp@mellanox.com net/mlx5e: Specify numa node when allocating drop rq
Shalom Toledo shalomt@mellanox.com mlxsw: spectrum_switchdev: Check success of FDB add operation
Tommi Rantala tommi.t.rantala@nokia.com sctp: fix dst refcnt leak in sctp_v4_get_dst
Gal Pressman galp@mellanox.com net/mlx5e: Fix TCP checksum in LRO buffers
Alexey Kodanev alexey.kodanev@oracle.com udplite: fix partial checksum initialization
Alexey Kodanev alexey.kodanev@oracle.com sctp: verify size of a new chunk in _sctp_make_chunk()
Guillaume Nault g.nault@alphalink.fr ppp: prevent unregistered channels from connecting to PPP units
Roman Kapl code@rkapl.cz net: sched: report if filter is too large to dump
Nicolas Dichtel nicolas.dichtel@6wind.com netlink: ensure to loop over all netns in genlmsg_multicast_allns()
Sabrina Dubroca sd@queasysnail.net net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68
Jakub Kicinski jakub.kicinski@netronome.com net: fix race on decreasing number of TX queues
Grygorii Strashko grygorii.strashko@ti.com net: ethernet: ti: cpsw: fix net watchdog timeout
Wolfram Sang wsa+renesas@sang-engineering.com net: amd-xgbe: fix comparison to bitshift when dealing with a mask
Arnd Bergmann arnd@arndb.de ipv6 sit: work around bogus gcc-8 -Wrestrict warning
Denis Du dudenis2000@yahoo.ca hdlc_ppp: carrier detect ok, don't turn off negotiation
Stefano Brivio sbrivio@redhat.com fib_semantics: Don't match route with mismatching tclassid
Xin Long lucien.xin@gmail.com bridge: check brport attr show in brport_show
Thomas Gleixner tglx@linutronix.de x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
Sebastian Panceac sebastian@resin.io x86/platform/intel-mid: Handle Intel Edison reboot correctly
Juergen Gross jgross@suse.com x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
Jan Kara jack@suse.cz direct-io: Fix sleep in atomic due to sync AIO
Dan Williams dan.j.williams@intel.com dax: fix vma_is_fsdax() helper
Viresh Kumar viresh.kumar@linaro.org cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
Dan Williams dan.j.williams@intel.com vfio: disable filesystem-dax page pinning
Ming Lei ming.lei@redhat.com block: kyber: fix domain token leak during requeue
Jiufei Xue jiufei.xue@linux.alibaba.com block: fix the count of PGPGOUT for WRITE_SAME
Anand Jain anand.jain@oracle.com btrfs: use proper endianness accessors for super_copy
John David Anglin dave.anglin@bell.net parisc: Fix ordering of cache and TLB flushes
Helge Deller deller@gmx.de parisc: Reduce irq overhead when run in qemu
Helge Deller deller@gmx.de parisc: Use cr16 interval timers unconditionally on qemu
Lingutla Chandrasekhar clingutla@codeaurora.org timers: Forward timer base before migrating timers
Shawn Lin shawn.lin@rock-chips.com mmc: dw_mmc: Fix out-of-bounds access for slot's caps
Shawn Lin shawn.lin@rock-chips.com mmc: dw_mmc: Factor out dw_mci_init_slot_caps
Shawn Lin shawn.lin@rock-chips.com mmc: dw_mmc: Avoid accessing registers in runtime suspended state
Geert Uytterhoeven geert+renesas@glider.be mmc: dw_mmc-k3: Fix out-of-bounds access through DT alias
Adrian Hunter adrian.hunter@intel.com mmc: sdhci-pci: Fix S0i3 for Intel BYT-based controllers
Takashi Iwai tiwai@suse.de ALSA: hda - Fix pincfg at resume on Lenovo T470 dock
Hans de Goede hdegoede@redhat.com ALSA: hda: Add a power_save blacklist
Takashi Iwai tiwai@suse.de ALSA: x86: Fix missing spinlock and mutex initializations
Richard Fitzgerald rf@opensource.cirrus.com ALSA: control: Fix memory corruption risk in snd_ctl_elem_read
Erik Veijola erik.veijola@gmail.com ALSA: usb-audio: Add a quirck for B&W PX headphones
Alexander Steffen Alexander.Steffen@infineon.com tpm_tis_spi: Use DMA-safe memory for SPI transfers
Arnd Bergmann arnd@arndb.de tpm: constify transmit data pointers
Jeremy Boone jeremy.boone@nccgroup.trust tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
Jeremy Boone jeremy.boone@nccgroup.trust tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on the bus
Jeremy Boone jeremy.boone@nccgroup.trust tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on the bus
Jeremy Boone jeremy.boone@nccgroup.trust tpm: fix potential buffer overruns caused by bit glitches on the bus
Jeremy Boone jeremy.boone@nccgroup.trust tpm: st33zp24: fix potential buffer overruns caused by bit glitches on the bus
Emil Tantilov emil.s.tantilov@intel.com ixgbe: fix crash in build_skb Rx code path
Hans de Goede hdegoede@redhat.com Bluetooth: btusb: Use DMI matching for QCA reset_resume quirking
-------------
Diffstat:
Documentation/networking/ip-sysctl.txt | 4 +- Makefile | 4 +- arch/arm/boot/dts/logicpd-som-lv.dtsi | 9 +- arch/arm/boot/dts/logicpd-torpedo-som.dtsi | 8 ++ arch/arm/boot/dts/rk3288-phycore-som.dtsi | 20 ---- arch/arm/kvm/hyp/Makefile | 5 + arch/arm/kvm/hyp/banked-sr.c | 4 + arch/arm/mach-mvebu/Kconfig | 4 +- arch/arm/plat-orion/common.c | 23 ++-- arch/parisc/include/asm/cacheflush.h | 1 + arch/parisc/include/asm/processor.h | 2 + arch/parisc/kernel/cache.c | 57 +++++----- arch/parisc/kernel/pacache.S | 22 ++++ arch/parisc/kernel/time.c | 11 +- arch/s390/kvm/interrupt.c | 25 ++++- arch/s390/kvm/kvm-s390.c | 79 +++++++------ arch/s390/kvm/kvm-s390.h | 5 +- arch/s390/kvm/priv.c | 9 +- arch/x86/include/asm/pgtable.h | 8 +- arch/x86/include/asm/pgtable_32.h | 1 + arch/x86/include/asm/pgtable_64.h | 1 + arch/x86/include/asm/pgtable_types.h | 10 ++ arch/x86/kernel/setup.c | 17 +-- arch/x86/kernel/setup_percpu.c | 17 +-- arch/x86/kvm/lapic.c | 11 +- arch/x86/kvm/mmu.c | 97 ++++++++-------- arch/x86/kvm/paging_tmpl.h | 18 +-- arch/x86/kvm/svm.c | 9 +- arch/x86/kvm/vmx.c | 9 +- arch/x86/kvm/x86.c | 12 +- arch/x86/mm/cpu_entry_area.c | 6 + arch/x86/mm/init_32.c | 15 +++ arch/x86/platform/intel-mid/intel-mid.c | 2 +- arch/x86/xen/suspend.c | 16 +++ block/blk-core.c | 2 +- block/blk-mq.c | 4 +- block/kyber-iosched.c | 1 + drivers/acpi/bus.c | 38 +++++-- drivers/bluetooth/btusb.c | 25 ++++- drivers/char/tpm/st33zp24/st33zp24.c | 4 +- drivers/char/tpm/tpm-interface.c | 4 + drivers/char/tpm/tpm2-cmd.c | 4 + drivers/char/tpm/tpm_i2c_infineon.c | 5 +- drivers/char/tpm/tpm_i2c_nuvoton.c | 8 +- drivers/char/tpm/tpm_tis.c | 2 +- drivers/char/tpm/tpm_tis_core.c | 9 +- drivers/char/tpm/tpm_tis_core.h | 4 +- drivers/char/tpm/tpm_tis_spi.c | 48 ++++---- drivers/cpufreq/s3c24xx-cpufreq.c | 8 +- drivers/edac/sb_edac.c | 2 +- drivers/md/md.c | 4 + drivers/media/dvb-frontends/m88ds3103.c | 7 +- drivers/mmc/host/dw_mmc-exynos.c | 1 + drivers/mmc/host/dw_mmc-k3.c | 4 + drivers/mmc/host/dw_mmc-rockchip.c | 1 + drivers/mmc/host/dw_mmc-zx.c | 1 + drivers/mmc/host/dw_mmc.c | 84 +++++++++----- drivers/mmc/host/dw_mmc.h | 2 + drivers/mmc/host/sdhci-pci-core.c | 35 +++++- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 2 +- drivers/net/ethernet/amd/xgbe/xgbe-pci.c | 2 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 49 +++++--- .../net/ethernet/mellanox/mlx5/core/en_selftest.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 2 +- .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 22 ++-- .../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 29 ++++- drivers/net/ethernet/ti/cpsw.c | 16 ++- drivers/net/phy/phy.c | 2 +- drivers/net/ppp/ppp_generic.c | 9 ++ drivers/net/tun.c | 7 ++ drivers/net/virtio_net.c | 8 +- drivers/net/wan/hdlc_ppp.c | 5 +- drivers/nvme/host/rdma.c | 54 +++------ drivers/pci/pcie/aspm.c | 8 +- drivers/s390/net/qeth_core.h | 7 +- drivers/s390/net/qeth_core_main.c | 43 +++---- drivers/s390/net/qeth_l3.h | 34 +++++- drivers/s390/net/qeth_l3_main.c | 123 +++++++++------------ drivers/vfio/vfio_iommu_type1.c | 18 ++- fs/btrfs/sysfs.c | 8 +- fs/btrfs/transaction.c | 20 ++-- fs/direct-io.c | 3 +- include/linux/fs.h | 2 +- include/linux/nospec.h | 3 +- include/net/udplite.h | 1 + kernel/time/timer.c | 6 + net/bridge/br_sysfs_if.c | 3 + net/bridge/br_vlan.c | 2 + net/core/dev.c | 11 +- net/core/gen_estimator.c | 1 + net/ipv4/fib_semantics.c | 5 + net/ipv4/route.c | 10 +- net/ipv4/tcp_input.c | 24 ++-- net/ipv4/tcp_output.c | 33 +++++- net/ipv4/udp.c | 5 + net/ipv6/ip6_checksum.c | 5 + net/ipv6/sit.c | 2 +- net/netlink/af_netlink.c | 4 +- net/netlink/genetlink.c | 12 +- net/rxrpc/output.c | 2 +- net/sched/cls_api.c | 7 +- net/sched/cls_u32.c | 24 ++-- net/sctp/input.c | 5 +- net/sctp/ipv6.c | 10 +- net/sctp/protocol.c | 10 +- net/sctp/sm_make_chunk.c | 7 +- sound/core/control.c | 2 +- sound/pci/hda/hda_intel.c | 38 ++++++- sound/pci/hda/patch_realtek.c | 3 +- sound/usb/quirks-table.h | 47 ++++++++ sound/x86/intel_hdmi_audio.c | 2 + virt/kvm/kvm_main.c | 3 +- 114 files changed, 1065 insertions(+), 564 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 1fdb926974695d3dbc05a429bafa266fdd16510e upstream.
Commit 61f5acea8737 ("Bluetooth: btusb: Restore QCA Rome suspend/resume fix with a "rewritten" version") applied the USB_QUIRK_RESET_RESUME to all QCA USB Bluetooth modules. But it turns out that the resume problems are not caused by the QCA Rome chipset, on most platforms it resumes fine. The resume problems are actually a platform problem (likely the platform cutting all power when suspended).
The USB_QUIRK_RESET_RESUME quirk also disables runtime suspend, so by matching on usb-ids, we're causing all boards with these chips to use extra power, to fix resume problems which only happen on some boards.
This commit fixes this by applying the quirk based on DMI matching instead of on usb-ids, so that we match the platform and not the chipset.
Here is the /sys/kernel/debug/usb/devices for the Bluetooth module:
T: Bus=01 Lev=01 Prnt=01 Port=07 Cnt=04 Dev#= 5 Spd=12 MxCh= 0 D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0cf3 ProdID=e300 Rev= 0.01 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514836 Fixes: 61f5acea8737 ("Bluetooth: btusb: Restore QCA Rome suspend/resume..") Cc: stable@vger.kernel.org Cc: Brian Norris briannorris@chromium.org Cc: Kai-Heng Feng kai.heng.feng@canonical.com Reported-and-tested-by: Kevin Fenzi kevin@scrye.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/bluetooth/btusb.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-)
--- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -21,6 +21,7 @@ * */
+#include <linux/dmi.h> #include <linux/module.h> #include <linux/usb.h> #include <linux/usb/quirks.h> @@ -381,6 +382,21 @@ static const struct usb_device_id blackl { } /* Terminating entry */ };
+/* The Bluetooth USB module build into some devices needs to be reset on resume, + * this is a problem with the platform (likely shutting off all power) not with + * the module itself. So we use a DMI list to match known broken platforms. + */ +static const struct dmi_system_id btusb_needs_reset_resume_table[] = { + { + /* Lenovo Yoga 920 (QCA Rome device 0cf3:e300) */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "Lenovo YOGA 920"), + }, + }, + {} +}; + #define BTUSB_MAX_ISOC_FRAMES 10
#define BTUSB_INTR_RUNNING 0 @@ -3013,6 +3029,9 @@ static int btusb_probe(struct usb_interf hdev->send = btusb_send_frame; hdev->notify = btusb_notify;
+ if (dmi_check_system(btusb_needs_reset_resume_table)) + interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME; + #ifdef CONFIG_PM err = btusb_config_oob_wake(hdev); if (err) @@ -3099,12 +3118,6 @@ static int btusb_probe(struct usb_interf if (id->driver_info & BTUSB_QCA_ROME) { data->setup_on_usb = btusb_setup_qca; hdev->set_bdaddr = btusb_set_bdaddr_ath3012; - - /* QCA Rome devices lose their updated firmware over suspend, - * but the USB hub doesn't notice any status change. - * explicitly request a device reset on resume. - */ - interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME; }
#ifdef CONFIG_BT_HCIBTUSB_RTL
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emil Tantilov emil.s.tantilov@intel.com
commit 0c5661ecc5dd7ce296870a3eb7b62b1b280a5e89 upstream.
Add check for build_skb enabled ring in ixgbe_dma_sync_frag(). In that case &skb_shinfo(skb)->frags[0] may not always be set which can lead to a crash. Instead we derive the page offset from skb->data.
Fixes: 42073d91a214 ("ixgbe: Have the CPU take ownership of the buffers sooner") CC: stable stable@vger.kernel.org Reported-by: Ambarish Soman asoman@redhat.com Suggested-by: Alexander Duyck alexander.h.duyck@intel.com Signed-off-by: Emil Tantilov emil.s.tantilov@intel.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -1877,6 +1877,14 @@ static void ixgbe_dma_sync_frag(struct i ixgbe_rx_pg_size(rx_ring), DMA_FROM_DEVICE, IXGBE_RX_DMA_ATTR); + } else if (ring_uses_build_skb(rx_ring)) { + unsigned long offset = (unsigned long)(skb->data) & ~PAGE_MASK; + + dma_sync_single_range_for_cpu(rx_ring->dev, + IXGBE_CB(skb)->dma, + offset, + skb_headlen(skb), + DMA_FROM_DEVICE); } else { struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Boone jeremy.boone@nccgroup.trust
commit 6d24cd186d9fead3722108dec1b1c993354645ff upstream.
Discrete TPMs are often connected over slow serial buses which, on some platforms, can have glitches causing bit flips. In all the driver _recv() functions, we need to use a u32 to unmarshal the response size, otherwise a bit flip of the 31st bit would cause the expected variable to go negative, which would then try to read a huge amount of data. Also sanity check that the expected amount of data is large enough for the TPM header.
Signed-off-by: Jeremy Boone jeremy.boone@nccgroup.trust Cc: stable@vger.kernel.org Signed-off-by: James Bottomley James.Bottomley@HansenPartnership.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: James Morris james.morris@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/st33zp24/st33zp24.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/char/tpm/st33zp24/st33zp24.c +++ b/drivers/char/tpm/st33zp24/st33zp24.c @@ -457,7 +457,7 @@ static int st33zp24_recv(struct tpm_chip size_t count) { int size = 0; - int expected; + u32 expected;
if (!chip) return -EBUSY; @@ -474,7 +474,7 @@ static int st33zp24_recv(struct tpm_chip }
expected = be32_to_cpu(*(__be32 *)(buf + 2)); - if (expected > count) { + if (expected > count || expected < TPM_HEADER_SIZE) { size = -EIO; goto out; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Boone jeremy.boone@nccgroup.trust
commit 3be23274755ee85771270a23af7691dc9b3a95db upstream.
Discrete TPMs are often connected over slow serial buses which, on some platforms, can have glitches causing bit flips. If a bit does flip it could cause an overrun if it's in one of the size parameters, so sanity check that we're not overrunning the provided buffer when doing a memcpy().
Signed-off-by: Jeremy Boone jeremy.boone@nccgroup.trust Cc: stable@vger.kernel.org Signed-off-by: James Bottomley James.Bottomley@HansenPartnership.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Tested-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: James Morris james.morris@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm-interface.c | 4 ++++ drivers/char/tpm/tpm2-cmd.c | 4 ++++ 2 files changed, 8 insertions(+)
--- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -1228,6 +1228,10 @@ int tpm_get_random(u32 chip_num, u8 *out break;
recd = be32_to_cpu(tpm_cmd.params.getrandom_out.rng_data_len); + if (recd > num_bytes) { + total = -EFAULT; + break; + }
rlength = be32_to_cpu(tpm_cmd.header.out.length); if (rlength < offsetof(struct tpm_getrandom_out, rng_data) + --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -683,6 +683,10 @@ static int tpm2_unseal_cmd(struct tpm_ch if (!rc) { data_len = be16_to_cpup( (__be16 *) &buf.data[TPM_HEADER_SIZE + 4]); + if (data_len < MIN_KEY_SIZE || data_len > MAX_KEY_SIZE + 1) { + rc = -EFAULT; + goto out; + }
rlength = be32_to_cpu(((struct tpm2_cmd *)&buf) ->header.out.length);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Boone jeremy.boone@nccgroup.trust
commit 9b8cb28d7c62568a5916bdd7ea1c9176d7f8f2ed upstream.
Discrete TPMs are often connected over slow serial buses which, on some platforms, can have glitches causing bit flips. In all the driver _recv() functions, we need to use a u32 to unmarshal the response size, otherwise a bit flip of the 31st bit would cause the expected variable to go negative, which would then try to read a huge amount of data. Also sanity check that the expected amount of data is large enough for the TPM header.
Signed-off-by: Jeremy Boone jeremy.boone@nccgroup.trust Cc: stable@vger.kernel.org Signed-off-by: James Bottomley James.Bottomley@HansenPartnership.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: James Morris james.morris@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm_i2c_infineon.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/char/tpm/tpm_i2c_infineon.c +++ b/drivers/char/tpm/tpm_i2c_infineon.c @@ -473,7 +473,8 @@ static int recv_data(struct tpm_chip *ch static int tpm_tis_i2c_recv(struct tpm_chip *chip, u8 *buf, size_t count) { int size = 0; - int expected, status; + int status; + u32 expected;
if (count < TPM_HEADER_SIZE) { size = -EIO; @@ -488,7 +489,7 @@ static int tpm_tis_i2c_recv(struct tpm_c }
expected = be32_to_cpu(*(__be32 *)(buf + 2)); - if ((size_t) expected > count) { + if (((size_t) expected > count) || (expected < TPM_HEADER_SIZE)) { size = -EIO; goto out; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Boone jeremy.boone@nccgroup.trust
commit f9d4d9b5a5ef2f017bc344fb65a58a902517173b upstream.
Discrete TPMs are often connected over slow serial buses which, on some platforms, can have glitches causing bit flips. In all the driver _recv() functions, we need to use a u32 to unmarshal the response size, otherwise a bit flip of the 31st bit would cause the expected variable to go negative, which would then try to read a huge amount of data. Also sanity check that the expected amount of data is large enough for the TPM header.
Signed-off-by: Jeremy Boone jeremy.boone@nccgroup.trust Cc: stable@vger.kernel.org Signed-off-by: James Bottomley James.Bottomley@HansenPartnership.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: James Morris james.morris@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm_i2c_nuvoton.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/char/tpm/tpm_i2c_nuvoton.c +++ b/drivers/char/tpm/tpm_i2c_nuvoton.c @@ -281,7 +281,11 @@ static int i2c_nuvoton_recv(struct tpm_c struct device *dev = chip->dev.parent; struct i2c_client *client = to_i2c_client(dev); s32 rc; - int expected, status, burst_count, retries, size = 0; + int status; + int burst_count; + int retries; + int size = 0; + u32 expected;
if (count < TPM_HEADER_SIZE) { i2c_nuvoton_ready(chip); /* return to idle */ @@ -323,7 +327,7 @@ static int i2c_nuvoton_recv(struct tpm_c * to machine native */ expected = be32_to_cpu(*(__be32 *) (buf + 2)); - if (expected > count) { + if (expected > count || expected < size) { dev_err(dev, "%s() expected > count\n", __func__); size = -EIO; continue;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Boone jeremy.boone@nccgroup.trust
commit 6bb320ca4a4a7b5b3db8c8d7250cc40002046878 upstream.
Discrete TPMs are often connected over slow serial buses which, on some platforms, can have glitches causing bit flips. In all the driver _recv() functions, we need to use a u32 to unmarshal the response size, otherwise a bit flip of the 31st bit would cause the expected variable to go negative, which would then try to read a huge amount of data. Also sanity check that the expected amount of data is large enough for the TPM header.
Signed-off-by: Jeremy Boone jeremy.boone@nccgroup.trust Cc: stable@vger.kernel.org Signed-off-by: James Bottomley James.Bottomley@HansenPartnership.com Tested-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: James Morris james.morris@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm_tis_core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -202,7 +202,8 @@ static int tpm_tis_recv(struct tpm_chip { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); int size = 0; - int expected, status; + int status; + u32 expected;
if (count < TPM_HEADER_SIZE) { size = -EIO; @@ -217,7 +218,7 @@ static int tpm_tis_recv(struct tpm_chip }
expected = be32_to_cpu(*(__be32 *) (buf + 2)); - if (expected > count) { + if (expected > count || expected < TPM_HEADER_SIZE) { size = -EIO; goto out; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
commit c37fbc09bd4977736f6bc4050c6f099c587052a7 upstream.
Making cmd_getticks 'const' introduced a couple of harmless warnings:
drivers/char/tpm/tpm_tis_core.c: In function 'probe_itpm': drivers/char/tpm/tpm_tis_core.c:469:31: error: passing argument 2 of 'tpm_tis_send_data' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] rc = tpm_tis_send_data(chip, cmd_getticks, len); drivers/char/tpm/tpm_tis_core.c:477:31: error: passing argument 2 of 'tpm_tis_send_data' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] rc = tpm_tis_send_data(chip, cmd_getticks, len); drivers/char/tpm/tpm_tis_core.c:255:12: note: expected 'u8 * {aka unsigned char *}' but argument is of type 'const u8 * {aka const unsigned char *}' static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len)
This changes the related functions to all take 'const' pointers so that gcc can see this as being correct. I had to slightly modify the logic around tpm_tis_spi_transfer() for this to work without introducing ugly casts.
Cc: stable@vger.kernel.org Fixes: 5e35bd8e06b9 ("tpm_tis: make array cmd_getticks static const to shink object code size") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Tested-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm_tis.c | 2 +- drivers/char/tpm/tpm_tis_core.c | 4 ++-- drivers/char/tpm/tpm_tis_core.h | 4 ++-- drivers/char/tpm/tpm_tis_spi.c | 25 +++++++++++-------------- 4 files changed, 16 insertions(+), 19 deletions(-)
--- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -223,7 +223,7 @@ static int tpm_tcg_read_bytes(struct tpm }
static int tpm_tcg_write_bytes(struct tpm_tis_data *data, u32 addr, u16 len, - u8 *value) + const u8 *value) { struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);
--- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -253,7 +253,7 @@ out: * tpm.c can skip polling for the data to be available as the interrupt is * waited for here */ -static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) +static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); int rc, status, burstcnt; @@ -344,7 +344,7 @@ static void disable_interrupts(struct tp * tpm.c can skip polling for the data to be available as the interrupt is * waited for here */ -static int tpm_tis_send_main(struct tpm_chip *chip, u8 *buf, size_t len) +static int tpm_tis_send_main(struct tpm_chip *chip, const u8 *buf, size_t len) { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); int rc; --- a/drivers/char/tpm/tpm_tis_core.h +++ b/drivers/char/tpm/tpm_tis_core.h @@ -98,7 +98,7 @@ struct tpm_tis_phy_ops { int (*read_bytes)(struct tpm_tis_data *data, u32 addr, u16 len, u8 *result); int (*write_bytes)(struct tpm_tis_data *data, u32 addr, u16 len, - u8 *value); + const u8 *value); int (*read16)(struct tpm_tis_data *data, u32 addr, u16 *result); int (*read32)(struct tpm_tis_data *data, u32 addr, u32 *result); int (*write32)(struct tpm_tis_data *data, u32 addr, u32 src); @@ -128,7 +128,7 @@ static inline int tpm_tis_read32(struct }
static inline int tpm_tis_write_bytes(struct tpm_tis_data *data, u32 addr, - u16 len, u8 *value) + u16 len, const u8 *value) { return data->phy_ops->write_bytes(data, addr, len, value); } --- a/drivers/char/tpm/tpm_tis_spi.c +++ b/drivers/char/tpm/tpm_tis_spi.c @@ -57,7 +57,7 @@ static inline struct tpm_tis_spi_phy *to }
static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, - u8 *buffer, u8 direction) + u8 *in, const u8 *out) { struct tpm_tis_spi_phy *phy = to_tpm_tis_spi_phy(data); int ret = 0; @@ -71,7 +71,7 @@ static int tpm_tis_spi_transfer(struct t while (len) { transfer_len = min_t(u16, len, MAX_SPI_FRAMESIZE);
- phy->tx_buf[0] = direction | (transfer_len - 1); + phy->tx_buf[0] = (in ? 0x80 : 0) | (transfer_len - 1); phy->tx_buf[1] = 0xd4; phy->tx_buf[2] = addr >> 8; phy->tx_buf[3] = addr; @@ -112,14 +112,8 @@ static int tpm_tis_spi_transfer(struct t spi_xfer.cs_change = 0; spi_xfer.len = transfer_len; spi_xfer.delay_usecs = 5; - - if (direction) { - spi_xfer.tx_buf = NULL; - spi_xfer.rx_buf = buffer; - } else { - spi_xfer.tx_buf = buffer; - spi_xfer.rx_buf = NULL; - } + spi_xfer.tx_buf = out; + spi_xfer.rx_buf = in;
spi_message_init(&m); spi_message_add_tail(&spi_xfer, &m); @@ -128,7 +122,10 @@ static int tpm_tis_spi_transfer(struct t goto exit;
len -= transfer_len; - buffer += transfer_len; + if (in) + in += transfer_len; + if (out) + out += transfer_len; }
exit: @@ -139,13 +136,13 @@ exit: static int tpm_tis_spi_read_bytes(struct tpm_tis_data *data, u32 addr, u16 len, u8 *result) { - return tpm_tis_spi_transfer(data, addr, len, result, 0x80); + return tpm_tis_spi_transfer(data, addr, len, result, NULL); }
static int tpm_tis_spi_write_bytes(struct tpm_tis_data *data, u32 addr, - u16 len, u8 *value) + u16 len, const u8 *value) { - return tpm_tis_spi_transfer(data, addr, len, value, 0); + return tpm_tis_spi_transfer(data, addr, len, NULL, value); }
static int tpm_tis_spi_read16(struct tpm_tis_data *data, u32 addr, u16 *result)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Steffen Alexander.Steffen@infineon.com
commit 6b3a13173f23e798e1ba213dd4a2c065a3b8d751 upstream.
The buffers used as tx_buf/rx_buf in a SPI transfer need to be DMA-safe. This cannot be guaranteed for the buffers passed to tpm_tis_spi_read_bytes and tpm_tis_spi_write_bytes. Therefore, we need to use our own DMA-safe buffer and copy the data to/from it.
The buffer needs to be allocated separately, to ensure that it is cacheline-aligned and not shared with other data, so that DMA can work correctly.
Fixes: 0edbfea537d1 ("tpm/tpm_tis_spi: Add support for spi phy") Cc: stable@vger.kernel.org Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Alexander Steffen Alexander.Steffen@infineon.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm_tis_spi.c | 45 ++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 18 deletions(-)
--- a/drivers/char/tpm/tpm_tis_spi.c +++ b/drivers/char/tpm/tpm_tis_spi.c @@ -46,9 +46,7 @@ struct tpm_tis_spi_phy { struct tpm_tis_data priv; struct spi_device *spi_device; - - u8 tx_buf[4]; - u8 rx_buf[4]; + u8 *iobuf; };
static inline struct tpm_tis_spi_phy *to_tpm_tis_spi_phy(struct tpm_tis_data *data) @@ -71,14 +69,14 @@ static int tpm_tis_spi_transfer(struct t while (len) { transfer_len = min_t(u16, len, MAX_SPI_FRAMESIZE);
- phy->tx_buf[0] = (in ? 0x80 : 0) | (transfer_len - 1); - phy->tx_buf[1] = 0xd4; - phy->tx_buf[2] = addr >> 8; - phy->tx_buf[3] = addr; + phy->iobuf[0] = (in ? 0x80 : 0) | (transfer_len - 1); + phy->iobuf[1] = 0xd4; + phy->iobuf[2] = addr >> 8; + phy->iobuf[3] = addr;
memset(&spi_xfer, 0, sizeof(spi_xfer)); - spi_xfer.tx_buf = phy->tx_buf; - spi_xfer.rx_buf = phy->rx_buf; + spi_xfer.tx_buf = phy->iobuf; + spi_xfer.rx_buf = phy->iobuf; spi_xfer.len = 4; spi_xfer.cs_change = 1;
@@ -88,9 +86,9 @@ static int tpm_tis_spi_transfer(struct t if (ret < 0) goto exit;
- if ((phy->rx_buf[3] & 0x01) == 0) { + if ((phy->iobuf[3] & 0x01) == 0) { // handle SPI wait states - phy->tx_buf[0] = 0; + phy->iobuf[0] = 0;
for (i = 0; i < TPM_RETRY; i++) { spi_xfer.len = 1; @@ -99,7 +97,7 @@ static int tpm_tis_spi_transfer(struct t ret = spi_sync_locked(phy->spi_device, &m); if (ret < 0) goto exit; - if (phy->rx_buf[0] & 0x01) + if (phy->iobuf[0] & 0x01) break; }
@@ -112,8 +110,14 @@ static int tpm_tis_spi_transfer(struct t spi_xfer.cs_change = 0; spi_xfer.len = transfer_len; spi_xfer.delay_usecs = 5; - spi_xfer.tx_buf = out; - spi_xfer.rx_buf = in; + + if (in) { + spi_xfer.tx_buf = NULL; + } else if (out) { + spi_xfer.rx_buf = NULL; + memcpy(phy->iobuf, out, transfer_len); + out += transfer_len; + }
spi_message_init(&m); spi_message_add_tail(&spi_xfer, &m); @@ -121,11 +125,12 @@ static int tpm_tis_spi_transfer(struct t if (ret < 0) goto exit;
- len -= transfer_len; - if (in) + if (in) { + memcpy(in, phy->iobuf, transfer_len); in += transfer_len; - if (out) - out += transfer_len; + } + + len -= transfer_len; }
exit: @@ -191,6 +196,10 @@ static int tpm_tis_spi_probe(struct spi_
phy->spi_device = dev;
+ phy->iobuf = devm_kmalloc(&dev->dev, MAX_SPI_FRAMESIZE, GFP_KERNEL); + if (!phy->iobuf) + return -ENOMEM; + return tpm_tis_core_init(&dev->dev, &phy->priv, -1, &tpm_spi_phy_ops, NULL); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Erik Veijola erik.veijola@gmail.com
commit 240a8af929c7c57dcde28682725b29cf8474e8e5 upstream.
The capture interface doesn't work and the playback interface only supports 48 kHz sampling rate even though it advertises more rates.
Signed-off-by: Erik Veijola erik.veijola@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/quirks-table.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
--- a/sound/usb/quirks-table.h +++ b/sound/usb/quirks-table.h @@ -3277,4 +3277,51 @@ AU0828_DEVICE(0x2040, 0x7270, "Hauppauge } },
+{ + /* + * Bower's & Wilkins PX headphones only support the 48 kHz sample rate + * even though it advertises more. The capture interface doesn't work + * even on windows. + */ + USB_DEVICE(0x19b5, 0x0021), + .driver_info = (unsigned long) &(const struct snd_usb_audio_quirk) { + .ifnum = QUIRK_ANY_INTERFACE, + .type = QUIRK_COMPOSITE, + .data = (const struct snd_usb_audio_quirk[]) { + { + .ifnum = 0, + .type = QUIRK_AUDIO_STANDARD_MIXER, + }, + /* Capture */ + { + .ifnum = 1, + .type = QUIRK_IGNORE_INTERFACE, + }, + /* Playback */ + { + .ifnum = 2, + .type = QUIRK_AUDIO_FIXED_ENDPOINT, + .data = &(const struct audioformat) { + .formats = SNDRV_PCM_FMTBIT_S16_LE, + .channels = 2, + .iface = 2, + .altsetting = 1, + .altset_idx = 1, + .attributes = UAC_EP_CS_ATTR_FILL_MAX | + UAC_EP_CS_ATTR_SAMPLE_RATE, + .endpoint = 0x03, + .ep_attr = USB_ENDPOINT_XFER_ISOC, + .rates = SNDRV_PCM_RATE_48000, + .rate_min = 48000, + .rate_max = 48000, + .nr_rates = 1, + .rate_table = (unsigned int[]) { + 48000 + } + } + }, + } + } +}, + #undef USB_DEVICE_VENDOR_SPEC
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
commit 5a23699a39abc5328921a81b89383d088f6ba9cc upstream.
The patch "ALSA: control: code refactoring for ELEM_READ/ELEM_WRITE operations" introduced a potential for kernel memory corruption due to an incorrect if statement allowing non-readable controls to fall through and call the get function. For TLV controls a driver can omit SNDRV_CTL_ELEM_ACCESS_READ to ensure that only the TLV get function can be called. Instead the normal get() can be invoked unexpectedly and as the driver expects that this will only be called for controls <= 512 bytes, potentially try to copy >512 bytes into the 512 byte return array, so corrupting kernel memory.
The problem is an attempt to refactor the snd_ctl_elem_read function to invert the logic so that it conditionally aborted if the control is unreadable instead of conditionally executing. But the if statement wasn't inverted correctly.
The correct inversion of
if (a && !b)
is if (!a || b)
Fixes: becf9e5d553c2 ("ALSA: control: code refactoring for ELEM_READ/ELEM_WRITE operations") Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/control.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/core/control.c +++ b/sound/core/control.c @@ -888,7 +888,7 @@ static int snd_ctl_elem_read(struct snd_
index_offset = snd_ctl_get_ioff(kctl, &control->id); vd = &kctl->vd[index_offset]; - if (!(vd->access & SNDRV_CTL_ELEM_ACCESS_READ) && kctl->get == NULL) + if (!(vd->access & SNDRV_CTL_ELEM_ACCESS_READ) || kctl->get == NULL) return -EPERM;
snd_ctl_build_ioff(&control->id, kctl, index_offset);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 350144069abf351c743d766b2fba9cb9b7cd32a1 upstream.
The commit change for supporting the multiple ports moved involved some code shuffling, and there the initializations of spinlock and mutex in snd_intelhad object were dropped mistakenly.
This patch adds the missing initializations again for each port.
Fixes: b4eb0d522fcb ("ALSA: x86: Split snd_intelhad into card and PCM specific structures") Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/x86/intel_hdmi_audio.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/x86/intel_hdmi_audio.c +++ b/sound/x86/intel_hdmi_audio.c @@ -1827,6 +1827,8 @@ static int hdmi_lpe_audio_probe(struct p ctx->port = port; ctx->pipe = -1;
+ spin_lock_init(&ctx->had_spinlock); + mutex_init(&ctx->mutex); INIT_WORK(&ctx->hdmi_audio_wq, had_audio_wq);
ret = snd_pcm_new(card, INTEL_HAD, port, MAX_PB_STREAMS,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 1ba8f9d308174e647b864c36209b4d7934d99888 upstream.
On some boards setting power_save to a non 0 value leads to clicking / popping sounds when ever we enter/leave powersaving mode. Ideally we would figure out how to avoid these sounds, but that is not always feasible.
This commit adds a blacklist for devices where powersaving is known to cause problems and disables it on these devices.
Note I tried to put this blacklist in userspace first: https://github.com/systemd/systemd/pull/8128
But the systemd maintainers rightfully pointed out that it would be impossible to then later remove entries once we actually find a way to make power-saving work on listed boards without issues. Having this list in the kernel will allow removal of the blacklist entry in the same commit which fixes the clicks / plops.
The blacklist only applies to the default power_save module-option value, if a user explicitly sets the module-option then the blacklist is not used.
[ added an ifdef CONFIG_PM for the build error -- tiwai]
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1525104 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=198611 Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/hda_intel.c | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-)
--- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -181,7 +181,7 @@ static const struct kernel_param_ops par }; #define param_check_xint param_check_int
-static int power_save = CONFIG_SND_HDA_POWER_SAVE_DEFAULT; +static int power_save = -1; module_param(power_save, xint, 0644); MODULE_PARM_DESC(power_save, "Automatic power-saving timeout " "(in second, 0 = disable)."); @@ -2186,6 +2186,24 @@ out_free: return err; }
+#ifdef CONFIG_PM +/* On some boards setting power_save to a non 0 value leads to clicking / + * popping sounds when ever we enter/leave powersaving mode. Ideally we would + * figure out how to avoid these sounds, but that is not always feasible. + * So we keep a list of devices where we disable powersaving as its known + * to causes problems on these devices. + */ +static struct snd_pci_quirk power_save_blacklist[] = { + /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ + SND_PCI_QUIRK(0x1849, 0x0c0c, "Asrock B85M-ITX", 0), + /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ + SND_PCI_QUIRK(0x1043, 0x8733, "Asus Prime X370-Pro", 0), + /* https://bugzilla.kernel.org/show_bug.cgi?id=198611 */ + SND_PCI_QUIRK(0x17aa, 0x2227, "Lenovo X1 Carbon 3rd Gen", 0), + {} +}; +#endif /* CONFIG_PM */ + /* number of codec slots for each chipset: 0 = default slots (i.e. 4) */ static unsigned int azx_max_codecs[AZX_NUM_DRIVERS] = { [AZX_DRIVER_NVIDIA] = 8, @@ -2198,6 +2216,7 @@ static int azx_probe_continue(struct azx struct hdac_bus *bus = azx_bus(chip); struct pci_dev *pci = chip->pci; int dev = chip->dev_index; + int val; int err;
hda->probe_continued = 1; @@ -2278,7 +2297,22 @@ static int azx_probe_continue(struct azx
chip->running = 1; azx_add_card_list(chip); - snd_hda_set_power_save(&chip->bus, power_save * 1000); + + val = power_save; +#ifdef CONFIG_PM + if (val == -1) { + const struct snd_pci_quirk *q; + + val = CONFIG_SND_HDA_POWER_SAVE_DEFAULT; + q = snd_pci_quirk_lookup(chip->pci, power_save_blacklist); + if (q && val) { + dev_info(chip->card->dev, "device %04x:%04x is on the power_save blacklist, forcing power_save to 0\n", + q->subvendor, q->subdevice); + val = 0; + } + } +#endif /* CONFIG_PM */ + snd_hda_set_power_save(&chip->bus, val * 1000); if (azx_has_pm_runtime(chip) || hda->use_vga_switcheroo) pm_runtime_put_autosuspend(&pci->dev);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 71db96ddfa72671bd43cacdcc99ca178d90ba267 upstream.
We've added a quirk to enable the recent Lenovo dock support, where it overwrites the pin configs of NID 0x17 and 19, not only updating the pin config cache. It works right after the boot, but the problem is that the pin configs are occasionally cleared when the machine goes to PM. Meanwhile the quirk writes the pin configs only at the pre-probe, so this won't be applied any longer.
For addressing that issue, this patch moves the code to overwrite the pin configs into HDA_FIXUP_ACT_INIT section so that it's always applied at both probe and resume time.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195161 Fixes: 61fcf8ece9b6 ("ALSA: hda/realtek - Enable Thinkpad Dock device for ALC298 platform") Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -4852,13 +4852,14 @@ static void alc_fixup_tpt470_dock(struct
if (action == HDA_FIXUP_ACT_PRE_PROBE) { spec->parse_flags = HDA_PINCFG_NO_HP_FIXUP; + snd_hda_apply_pincfgs(codec, pincfgs); + } else if (action == HDA_FIXUP_ACT_INIT) { /* Enable DOCK device */ snd_hda_codec_write(codec, 0x17, 0, AC_VERB_SET_CONFIG_DEFAULT_BYTES_3, 0); /* Enable DOCK device */ snd_hda_codec_write(codec, 0x19, 0, AC_VERB_SET_CONFIG_DEFAULT_BYTES_3, 0); - snd_hda_apply_pincfgs(codec, pincfgs); } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
commit f8870ae6e2d6be75b1accc2db981169fdfbea7ab upstream.
Tuning can leave the IP in an active state (Buffer Read Enable bit set) which prevents the entry to low power states (i.e. S0i3). Data reset will clear it.
Generally tuning is followed by a data transfer which will anyway sort out the state, so it is rare that S0i3 is actually prevented.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-pci-core.c | 35 +++++++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-)
--- a/drivers/mmc/host/sdhci-pci-core.c +++ b/drivers/mmc/host/sdhci-pci-core.c @@ -594,9 +594,36 @@ static void byt_read_dsm(struct sdhci_pc slot->chip->rpm_retune = intel_host->d3_retune; }
-static int byt_emmc_probe_slot(struct sdhci_pci_slot *slot) +static int intel_execute_tuning(struct mmc_host *mmc, u32 opcode) +{ + int err = sdhci_execute_tuning(mmc, opcode); + struct sdhci_host *host = mmc_priv(mmc); + + if (err) + return err; + + /* + * Tuning can leave the IP in an active state (Buffer Read Enable bit + * set) which prevents the entry to low power states (i.e. S0i3). Data + * reset will clear it. + */ + sdhci_reset(host, SDHCI_RESET_DATA); + + return 0; +} + +static void byt_probe_slot(struct sdhci_pci_slot *slot) { + struct mmc_host_ops *ops = &slot->host->mmc_host_ops; + byt_read_dsm(slot); + + ops->execute_tuning = intel_execute_tuning; +} + +static int byt_emmc_probe_slot(struct sdhci_pci_slot *slot) +{ + byt_probe_slot(slot); slot->host->mmc->caps |= MMC_CAP_8_BIT_DATA | MMC_CAP_NONREMOVABLE | MMC_CAP_HW_RESET | MMC_CAP_1_8V_DDR | MMC_CAP_CMD_DURING_TFR | @@ -651,7 +678,7 @@ static int ni_byt_sdio_probe_slot(struct { int err;
- byt_read_dsm(slot); + byt_probe_slot(slot);
err = ni_set_max_freq(slot); if (err) @@ -664,7 +691,7 @@ static int ni_byt_sdio_probe_slot(struct
static int byt_sdio_probe_slot(struct sdhci_pci_slot *slot) { - byt_read_dsm(slot); + byt_probe_slot(slot); slot->host->mmc->caps |= MMC_CAP_POWER_OFF_CARD | MMC_CAP_NONREMOVABLE | MMC_CAP_WAIT_WHILE_BUSY; return 0; @@ -672,7 +699,7 @@ static int byt_sdio_probe_slot(struct sd
static int byt_sd_probe_slot(struct sdhci_pci_slot *slot) { - byt_read_dsm(slot); + byt_probe_slot(slot); slot->host->mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY | MMC_CAP_AGGRESSIVE_PM | MMC_CAP_CD_WAKE; slot->cd_idx = 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
commit 325501d9360eb42c7c51e6daa0d733844c1e790b upstream.
The hs_timing_cfg[] array is indexed using a value derived from the "mshcN" alias in DT, which may lead to an out-of-bounds access.
Fix this by adding a range check.
Fixes: 361c7fe9b02eee7e ("mmc: dw_mmc-k3: add sd support for hi3660") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: Shawn Lin shawn.lin@rock-chips.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/dw_mmc-k3.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/mmc/host/dw_mmc-k3.c +++ b/drivers/mmc/host/dw_mmc-k3.c @@ -135,6 +135,9 @@ static int dw_mci_hi6220_parse_dt(struct if (priv->ctrl_id < 0) priv->ctrl_id = 0;
+ if (priv->ctrl_id >= TIMING_MODE) + return -EINVAL; + host->priv = priv; return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shawn Lin shawn.lin@rock-chips.com
commit 5b43df8b4c1a7f0c3fbf793c9566068e6b1e570c upstream.
cat /sys/kernel/debug/mmc0/regs will hang up the system since it's in runtime suspended state, so the genpd and biu_clk is off. This patch fixes this problem by calling pm_runtime_get_sync to wake it up before reading the registers.
Fixes: e9ed8835e990 ("mmc: dw_mmc: add runtime PM callback") Cc: stable@vger.kernel.org Signed-off-by: Shawn Lin shawn.lin@rock-chips.com Reviewed-by: Jaehoon Chung jh80.chung@samsung.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/dw_mmc.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -165,6 +165,8 @@ static int dw_mci_regs_show(struct seq_f { struct dw_mci *host = s->private;
+ pm_runtime_get_sync(host->dev); + seq_printf(s, "STATUS:\t0x%08x\n", mci_readl(host, STATUS)); seq_printf(s, "RINTSTS:\t0x%08x\n", mci_readl(host, RINTSTS)); seq_printf(s, "CMD:\t0x%08x\n", mci_readl(host, CMD)); @@ -172,6 +174,8 @@ static int dw_mci_regs_show(struct seq_f seq_printf(s, "INTMASK:\t0x%08x\n", mci_readl(host, INTMASK)); seq_printf(s, "CLKENA:\t0x%08x\n", mci_readl(host, CLKENA));
+ pm_runtime_put_autosuspend(host->dev); + return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shawn Lin shawn.lin@rock-chips.com
commit a4faa4929ed3be15e2d500d2405f992f6dedc8eb upstream.
Factor out dw_mci_init_slot_caps to consolidate parsing all differents types of capabilities from host contrllers. No functional change intended.
Signed-off-by: Shawn Lin shawn.lin@rock-chips.com Fixes: 800d78bfccb3 ("mmc: dw_mmc: add support for implementation specific callbacks") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/dw_mmc.c | 73 +++++++++++++++++++++++++++------------------- 1 file changed, 43 insertions(+), 30 deletions(-)
--- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2762,12 +2762,50 @@ static irqreturn_t dw_mci_interrupt(int return IRQ_HANDLED; }
+static int dw_mci_init_slot_caps(struct dw_mci_slot *slot) +{ + struct dw_mci *host = slot->host; + const struct dw_mci_drv_data *drv_data = host->drv_data; + struct mmc_host *mmc = slot->mmc; + int ctrl_id; + + if (host->pdata->caps) + mmc->caps = host->pdata->caps; + + /* + * Support MMC_CAP_ERASE by default. + * It needs to use trim/discard/erase commands. + */ + mmc->caps |= MMC_CAP_ERASE; + + if (host->pdata->pm_caps) + mmc->pm_caps = host->pdata->pm_caps; + + if (host->dev->of_node) { + ctrl_id = of_alias_get_id(host->dev->of_node, "mshc"); + if (ctrl_id < 0) + ctrl_id = 0; + } else { + ctrl_id = to_platform_device(host->dev)->id; + } + if (drv_data && drv_data->caps) + mmc->caps |= drv_data->caps[ctrl_id]; + + if (host->pdata->caps2) + mmc->caps2 = host->pdata->caps2; + + /* Process SDIO IRQs through the sdio_irq_work. */ + if (mmc->caps & MMC_CAP_SDIO_IRQ) + mmc->caps2 |= MMC_CAP2_SDIO_IRQ_NOTHREAD; + + return 0; +} + static int dw_mci_init_slot(struct dw_mci *host) { struct mmc_host *mmc; struct dw_mci_slot *slot; - const struct dw_mci_drv_data *drv_data = host->drv_data; - int ctrl_id, ret; + int ret; u32 freq[2];
mmc = mmc_alloc_host(sizeof(struct dw_mci_slot), host->dev); @@ -2801,38 +2839,13 @@ static int dw_mci_init_slot(struct dw_mc if (!mmc->ocr_avail) mmc->ocr_avail = MMC_VDD_32_33 | MMC_VDD_33_34;
- if (host->pdata->caps) - mmc->caps = host->pdata->caps; - - /* - * Support MMC_CAP_ERASE by default. - * It needs to use trim/discard/erase commands. - */ - mmc->caps |= MMC_CAP_ERASE; - - if (host->pdata->pm_caps) - mmc->pm_caps = host->pdata->pm_caps; - - if (host->dev->of_node) { - ctrl_id = of_alias_get_id(host->dev->of_node, "mshc"); - if (ctrl_id < 0) - ctrl_id = 0; - } else { - ctrl_id = to_platform_device(host->dev)->id; - } - if (drv_data && drv_data->caps) - mmc->caps |= drv_data->caps[ctrl_id]; - - if (host->pdata->caps2) - mmc->caps2 = host->pdata->caps2; - ret = mmc_of_parse(mmc); if (ret) goto err_host_allocated;
- /* Process SDIO IRQs through the sdio_irq_work. */ - if (mmc->caps & MMC_CAP_SDIO_IRQ) - mmc->caps2 |= MMC_CAP2_SDIO_IRQ_NOTHREAD; + ret = dw_mci_init_slot_caps(slot); + if (ret) + goto err_host_allocated;
/* Useful defaults if platform data is unset. */ if (host->use_dma == TRANS_MODE_IDMAC) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shawn Lin shawn.lin@rock-chips.com
commit 0d84b9e5631d923744767dc6608672df906dd092 upstream.
Add num_caps field for dw_mci_drv_data to validate the controller id from DT alias and non-DT ways.
Reported-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Shawn Lin shawn.lin@rock-chips.com Fixes: 800d78bfccb3 ("mmc: dw_mmc: add support for implementation specific callbacks") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/dw_mmc-exynos.c | 1 + drivers/mmc/host/dw_mmc-k3.c | 1 + drivers/mmc/host/dw_mmc-rockchip.c | 1 + drivers/mmc/host/dw_mmc-zx.c | 1 + drivers/mmc/host/dw_mmc.c | 9 ++++++++- drivers/mmc/host/dw_mmc.h | 2 ++ 6 files changed, 14 insertions(+), 1 deletion(-)
--- a/drivers/mmc/host/dw_mmc-exynos.c +++ b/drivers/mmc/host/dw_mmc-exynos.c @@ -487,6 +487,7 @@ static unsigned long exynos_dwmmc_caps[4
static const struct dw_mci_drv_data exynos_drv_data = { .caps = exynos_dwmmc_caps, + .num_caps = ARRAY_SIZE(exynos_dwmmc_caps), .init = dw_mci_exynos_priv_init, .set_ios = dw_mci_exynos_set_ios, .parse_dt = dw_mci_exynos_parse_dt, --- a/drivers/mmc/host/dw_mmc-k3.c +++ b/drivers/mmc/host/dw_mmc-k3.c @@ -210,6 +210,7 @@ static int dw_mci_hi6220_execute_tuning(
static const struct dw_mci_drv_data hi6220_data = { .caps = dw_mci_hi6220_caps, + .num_caps = ARRAY_SIZE(dw_mci_hi6220_caps), .switch_voltage = dw_mci_hi6220_switch_voltage, .set_ios = dw_mci_hi6220_set_ios, .parse_dt = dw_mci_hi6220_parse_dt, --- a/drivers/mmc/host/dw_mmc-rockchip.c +++ b/drivers/mmc/host/dw_mmc-rockchip.c @@ -319,6 +319,7 @@ static const struct dw_mci_drv_data rk29
static const struct dw_mci_drv_data rk3288_drv_data = { .caps = dw_mci_rk3288_dwmmc_caps, + .num_caps = ARRAY_SIZE(dw_mci_rk3288_dwmmc_caps), .set_ios = dw_mci_rk3288_set_ios, .execute_tuning = dw_mci_rk3288_execute_tuning, .parse_dt = dw_mci_rk3288_parse_dt, --- a/drivers/mmc/host/dw_mmc-zx.c +++ b/drivers/mmc/host/dw_mmc-zx.c @@ -195,6 +195,7 @@ static unsigned long zx_dwmmc_caps[3] =
static const struct dw_mci_drv_data zx_drv_data = { .caps = zx_dwmmc_caps, + .num_caps = ARRAY_SIZE(zx_dwmmc_caps), .execute_tuning = dw_mci_zx_execute_tuning, .prepare_hs400_tuning = dw_mci_zx_prepare_hs400_tuning, .parse_dt = dw_mci_zx_parse_dt, --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2788,8 +2788,15 @@ static int dw_mci_init_slot_caps(struct } else { ctrl_id = to_platform_device(host->dev)->id; } - if (drv_data && drv_data->caps) + + if (drv_data && drv_data->caps) { + if (ctrl_id >= drv_data->num_caps) { + dev_err(host->dev, "invalid controller id %d\n", + ctrl_id); + return -EINVAL; + } mmc->caps |= drv_data->caps[ctrl_id]; + }
if (host->pdata->caps2) mmc->caps2 = host->pdata->caps2; --- a/drivers/mmc/host/dw_mmc.h +++ b/drivers/mmc/host/dw_mmc.h @@ -542,6 +542,7 @@ struct dw_mci_slot { /** * dw_mci driver data - dw-mshc implementation specific driver data. * @caps: mmc subsystem specified capabilities of the controller(s). + * @num_caps: number of capabilities specified by @caps. * @init: early implementation specific initialization. * @set_ios: handle bus specific extensions. * @parse_dt: parse implementation specific device tree properties. @@ -553,6 +554,7 @@ struct dw_mci_slot { */ struct dw_mci_drv_data { unsigned long *caps; + u32 num_caps; int (*init)(struct dw_mci *host); void (*set_ios)(struct dw_mci *host, struct mmc_ios *ios); int (*parse_dt)(struct dw_mci *host);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lingutla Chandrasekhar clingutla@codeaurora.org
commit c52232a49e203a65a6e1a670cd5262f59e9364a0 upstream.
On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a live CPU. This happens from the control thread which initiated the unplug.
If the CPU on which the control thread runs came out from a longer idle period then the base clock of that CPU might be stale because the control thread runs prior to any event which forwards the clock.
In such a case the timers from the unplugged CPU are queued on the live CPU based on the stale clock which can cause large delays due to increased granularity of the outer timer wheels which are far away from base:;clock.
But there is a worse problem than that. The following sequence of events illustrates it:
- CPU0 timer1 is queued expires = 59969 and base->clk = 59131.
The timer is queued at wheel level 2, with resulting expiry time = 60032 (due to level granularity).
- CPU1 enters idle @60007, with next timer expiry @60020.
- CPU0 is hotplugged at @60009
- CPU1 exits idle and runs the control thread which migrates the timers from CPU0
timer1 is now queued in level 0 for immediate handling in the next softirq because the requested expiry time 59969 is before CPU1 base->clk 60007
- CPU1 runs code which forwards the base clock which succeeds because the next expiring timer. which was collected at idle entry time is still set to 60020.
So it forwards beyond 60007 and therefore misses to expire the migrated timer1. That timer gets expired when the wheel wraps around again, which takes between 63 and 630ms depending on the HZ setting.
Address both problems by invoking forward_timer_base() for the control CPUs timer base. All other places, which might run into a similar problem (mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid that.
[ tglx: Massaged comment and changelog ]
Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible") Co-developed-by: Neeraj Upadhyay neeraju@codeaurora.org Signed-off-by: Neeraj Upadhyay neeraju@codeaurora.org Signed-off-by: Lingutla Chandrasekhar clingutla@codeaurora.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Anna-Maria Gleixner anna-maria@linutronix.de Cc: linux-arm-msm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180118115022.6368-1-clingutla@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/timer.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1834,6 +1834,12 @@ int timers_dead_cpu(unsigned int cpu) raw_spin_lock_irq(&new_base->lock); raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
+ /* + * The current CPUs base clock might be stale. Update it + * before moving the timers over. + */ + forward_timer_base(new_base); + BUG_ON(old_base->running_timer);
for (i = 0; i < WHEEL_SIZE; i++)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 5ffa8518851f1401817c15d2a7eecc0373c26ff9 upstream.
When running on qemu we know that the (emulated) cr16 cpu-internal clocks are syncronized. So let's use them unconditionally on qemu.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/include/asm/processor.h | 2 ++ arch/parisc/kernel/time.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/arch/parisc/include/asm/processor.h +++ b/arch/parisc/include/asm/processor.h @@ -316,6 +316,8 @@ extern int _parisc_requires_coherency; #define parisc_requires_coherency() (0) #endif
+extern int running_on_qemu; + #endif /* __ASSEMBLY__ */
#endif /* __ASM_PARISC_PROCESSOR_H */ --- a/arch/parisc/kernel/time.c +++ b/arch/parisc/kernel/time.c @@ -248,7 +248,7 @@ static int __init init_cr16_clocksource( * different sockets, so mark them unstable and lower rating on * multi-socket SMP systems. */ - if (num_online_cpus() > 1) { + if (num_online_cpus() > 1 && !running_on_qemu) { int cpu; unsigned long cpu0_loc; cpu0_loc = per_cpu(cpu_data, 0).cpu_loc;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 636a415bcc7f4fd020ece8fd5fc648c4cef19c34 upstream.
When run under QEMU, calling mfctl(16) creates some overhead because the qemu timer has to be scaled and moved into the register. This patch reduces the number of calls to mfctl(16) by moving the calls out of the loops.
Additionally, increase the minimal time interval to 8000 cycles instead of 500 to compensate possible QEMU delays when delivering interrupts.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/kernel/time.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/arch/parisc/kernel/time.c +++ b/arch/parisc/kernel/time.c @@ -76,10 +76,10 @@ irqreturn_t __irq_entry timer_interrupt( next_tick = cpuinfo->it_value;
/* Calculate how many ticks have elapsed. */ + now = mfctl(16); do { ++ticks_elapsed; next_tick += cpt; - now = mfctl(16); } while (next_tick - now > cpt);
/* Store (in CR16 cycles) up to when we are accounting right now. */ @@ -103,16 +103,17 @@ irqreturn_t __irq_entry timer_interrupt( * if one or the other wrapped. If "now" is "bigger" we'll end up * with a very large unsigned number. */ - while (next_tick - mfctl(16) > cpt) + now = mfctl(16); + while (next_tick - now > cpt) next_tick += cpt;
/* Program the IT when to deliver the next interrupt. * Only bottom 32-bits of next_tick are writable in CR16! * Timer interrupt will be delivered at least a few hundred cycles - * after the IT fires, so if we are too close (<= 500 cycles) to the + * after the IT fires, so if we are too close (<= 8000 cycles) to the * next cycle, simply skip it. */ - if (next_tick - mfctl(16) <= 500) + if (next_tick - now <= 8000) next_tick += cpt; mtctl(next_tick, 16);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anand Jain anand.jain@oracle.com
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
The fs_info::super_copy is a byte copy of the on-disk structure and all members must use the accessor macros/functions to obtain the right value. This was missing in update_super_roots and in sysfs readers.
Moving between opposite endianness hosts will report bogus numbers in sysfs, and mount may fail as the root will not be restored correctly. If the filesystem is always used on a same endian host, this will not be a problem.
Fix this by using the btrfs_set_super...() functions to set fs_info::super_copy values, and for the sysfs, use the cached fs_info::nodesize/sectorsize values.
CC: stable@vger.kernel.org Fixes: df93589a17378 ("btrfs: export more from FS_INFO to sysfs") Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: Liu Bo bo.li.liu@oracle.com Reviewed-by: David Sterba dsterba@suse.com [ update changelog ] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/sysfs.c | 8 +++----- fs/btrfs/transaction.c | 20 ++++++++++++-------- 2 files changed, 15 insertions(+), 13 deletions(-)
--- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -422,7 +422,7 @@ static ssize_t btrfs_nodesize_show(struc { struct btrfs_fs_info *fs_info = to_fs_info(kobj);
- return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->super_copy->nodesize); + return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->nodesize); }
BTRFS_ATTR(nodesize, btrfs_nodesize_show); @@ -432,8 +432,7 @@ static ssize_t btrfs_sectorsize_show(str { struct btrfs_fs_info *fs_info = to_fs_info(kobj);
- return snprintf(buf, PAGE_SIZE, "%u\n", - fs_info->super_copy->sectorsize); + return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->sectorsize); }
BTRFS_ATTR(sectorsize, btrfs_sectorsize_show); @@ -443,8 +442,7 @@ static ssize_t btrfs_clone_alignment_sho { struct btrfs_fs_info *fs_info = to_fs_info(kobj);
- return snprintf(buf, PAGE_SIZE, "%u\n", - fs_info->super_copy->sectorsize); + return snprintf(buf, PAGE_SIZE, "%u\n", fs_info->sectorsize); }
BTRFS_ATTR(clone_alignment, btrfs_clone_alignment_show); --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1722,19 +1722,23 @@ static void update_super_roots(struct bt
super = fs_info->super_copy;
+ /* update latest btrfs_super_block::chunk_root refs */ root_item = &fs_info->chunk_root->root_item; - super->chunk_root = root_item->bytenr; - super->chunk_root_generation = root_item->generation; - super->chunk_root_level = root_item->level; + btrfs_set_super_chunk_root(super, root_item->bytenr); + btrfs_set_super_chunk_root_generation(super, root_item->generation); + btrfs_set_super_chunk_root_level(super, root_item->level);
+ /* update latest btrfs_super_block::root refs */ root_item = &fs_info->tree_root->root_item; - super->root = root_item->bytenr; - super->generation = root_item->generation; - super->root_level = root_item->level; + btrfs_set_super_root(super, root_item->bytenr); + btrfs_set_super_generation(super, root_item->generation); + btrfs_set_super_root_level(super, root_item->level); + if (btrfs_test_opt(fs_info, SPACE_CACHE)) - super->cache_generation = root_item->generation; + btrfs_set_super_cache_generation(super, root_item->generation); if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags)) - super->uuid_tree_generation = root_item->generation; + btrfs_set_super_uuid_tree_generation(super, + root_item->generation); }
int btrfs_transaction_in_commit(struct btrfs_fs_info *info)
Greg Kroah-Hartman wrote...
4.14-stable review patch. If anyone has any objections, please let me know.
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
(...)
If the filesystem is always used on a same endian host, this will not be a problem.
From my observations I cannot quite subscribe to that.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
# mkfs.btrfs $DEV # mount $DEV /mnt/tmp/ # umount /mnt/tmp/
This simple umount corrupts the file system:
# mount $DEV /mnt/tmp/ mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
# dmesg: BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384 BTRFS error (device <dev>): failed to read chunk root BTRFS error (device <dev>): open_ctree failed
Also fsck is of no help:
# btrfsck $DEV Couldn't map the block 18102363734671360 No mapping for 18102363734671360-18102363734687744 Couldn't map the block 18102363734671360 bytenr mismatch, want=18102363734671360, have=0 ERROR: cannot read chunk root ERROR: cannot open file system
Trying mount or fsck on a little-endian system does not help either. So I consider the data on that device lost - luckily I use btrfs only for files where a backup exists all the time.
Reverting that change restored the previous error-free behaviour. I didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last that affected these files. Still I could give this a try if anybody wishes so.
Cheers,
Christoph
On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
4.14-stable review patch. If anyone has any objections, please let me know.
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
(...)
If the filesystem is always used on a same endian host, this will not be a problem.
From my observations I cannot quite subscribe to that.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
# mkfs.btrfs $DEV # mount $DEV /mnt/tmp/ # umount /mnt/tmp/
This simple umount corrupts the file system:
# mount $DEV /mnt/tmp/ mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
# dmesg: BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384 BTRFS error (device <dev>): failed to read chunk root BTRFS error (device <dev>): open_ctree failed
Also fsck is of no help:
# btrfsck $DEV Couldn't map the block 18102363734671360 No mapping for 18102363734671360-18102363734687744 Couldn't map the block 18102363734671360 bytenr mismatch, want=18102363734671360, have=0 ERROR: cannot read chunk root ERROR: cannot open file system
Trying mount or fsck on a little-endian system does not help either. So I consider the data on that device lost - luckily I use btrfs only for files where a backup exists all the time.
Reverting that change restored the previous error-free behaviour. I didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last that affected these files. Still I could give this a try if anybody wishes so.
That sucks. Can you test Linus's tree to verify the problem is there? I'll gladly revert this if Linus's tree also gets the revert, I don't want you to hit this when you upgrade to a newer kernel.
thanks,
greg k-h
On Fri, Mar 16, 2018 at 01:30:49PM +0100, Greg Kroah-Hartman wrote:
On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
4.14-stable review patch. If anyone has any objections, please let me know.
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
(...)
If the filesystem is always used on a same endian host, this will not be a problem.
From my observations I cannot quite subscribe to that.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
# mkfs.btrfs $DEV # mount $DEV /mnt/tmp/ # umount /mnt/tmp/
This simple umount corrupts the file system:
# mount $DEV /mnt/tmp/ mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
# dmesg: BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384 BTRFS error (device <dev>): failed to read chunk root BTRFS error (device <dev>): open_ctree failed
Also fsck is of no help:
# btrfsck $DEV Couldn't map the block 18102363734671360 No mapping for 18102363734671360-18102363734687744 Couldn't map the block 18102363734671360 bytenr mismatch, want=18102363734671360, have=0 ERROR: cannot read chunk root ERROR: cannot open file system
Trying mount or fsck on a little-endian system does not help either. So I consider the data on that device lost - luckily I use btrfs only for files where a backup exists all the time.
Reverting that change restored the previous error-free behaviour. I didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last that affected these files. Still I could give this a try if anybody wishes so.
That sucks. Can you test Linus's tree to verify the problem is there? I'll gladly revert this if Linus's tree also gets the revert, I don't want you to hit this when you upgrade to a newer kernel.
I'll push a fix for the upcoming rc but I think it would be better to remove the broken patch from stable kernels ASAP, so I'd recommend to revert it now.
On Fri, Mar 16, 2018 at 02:22:02PM +0100, David Sterba wrote:
On Fri, Mar 16, 2018 at 01:30:49PM +0100, Greg Kroah-Hartman wrote:
On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
4.14-stable review patch. If anyone has any objections, please let me know.
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
(...)
If the filesystem is always used on a same endian host, this will not be a problem.
From my observations I cannot quite subscribe to that.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
# mkfs.btrfs $DEV # mount $DEV /mnt/tmp/ # umount /mnt/tmp/
This simple umount corrupts the file system:
# mount $DEV /mnt/tmp/ mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
# dmesg: BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384 BTRFS error (device <dev>): failed to read chunk root BTRFS error (device <dev>): open_ctree failed
Also fsck is of no help:
# btrfsck $DEV Couldn't map the block 18102363734671360 No mapping for 18102363734671360-18102363734687744 Couldn't map the block 18102363734671360 bytenr mismatch, want=18102363734671360, have=0 ERROR: cannot read chunk root ERROR: cannot open file system
Trying mount or fsck on a little-endian system does not help either. So I consider the data on that device lost - luckily I use btrfs only for files where a backup exists all the time.
Reverting that change restored the previous error-free behaviour. I didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last that affected these files. Still I could give this a try if anybody wishes so.
That sucks. Can you test Linus's tree to verify the problem is there? I'll gladly revert this if Linus's tree also gets the revert, I don't want you to hit this when you upgrade to a newer kernel.
I'll push a fix for the upcoming rc but I think it would be better to remove the broken patch from stable kernels ASAP, so I'd recommend to revert it now.
Now reverted, thanks.
greg k-h
On 03/16/2018 10:02 PM, Greg Kroah-Hartman wrote:
On Fri, Mar 16, 2018 at 02:22:02PM +0100, David Sterba wrote:
On Fri, Mar 16, 2018 at 01:30:49PM +0100, Greg Kroah-Hartman wrote:
On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
4.14-stable review patch. If anyone has any objections, please let me know.
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
(...)
If the filesystem is always used on a same endian host, this will not be a problem.
From my observations I cannot quite subscribe to that.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
# mkfs.btrfs $DEV # mount $DEV /mnt/tmp/ # umount /mnt/tmp/
This simple umount corrupts the file system:
# mount $DEV /mnt/tmp/ mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
# dmesg: BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384 BTRFS error (device <dev>): failed to read chunk root BTRFS error (device <dev>): open_ctree failed
Also fsck is of no help:
# btrfsck $DEV Couldn't map the block 18102363734671360 No mapping for 18102363734671360-18102363734687744 Couldn't map the block 18102363734671360 bytenr mismatch, want=18102363734671360, have=0 ERROR: cannot read chunk root ERROR: cannot open file system
Trying mount or fsck on a little-endian system does not help either. So I consider the data on that device lost - luckily I use btrfs only for files where a backup exists all the time.
Reverting that change restored the previous error-free behaviour. I didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last that affected these files. Still I could give this a try if anybody wishes so.
That sucks. Can you test Linus's tree to verify the problem is there? I'll gladly revert this if Linus's tree also gets the revert, I don't want you to hit this when you upgrade to a newer kernel.
I'll push a fix for the upcoming rc but I think it would be better to remove the broken patch from stable kernels ASAP, so I'd recommend to revert it now.
Now reverted, thanks.
Thanks ! Sorry for the mess.
-Anand
greg k-h
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman wrote...
On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
That sucks. Can you test Linus's tree to verify the problem is there? I'll gladly revert this if Linus's tree also gets the revert, I don't want you to hit this when you upgrade to a newer kernel.
Confirmed: The problem is, err ... was in Linus' tree as well. The rather recent commit 8f5fd927c3a7 reverted the change, after that everything is as expected again.
Looking at the original commit, I don't have a clue why things go wrong so horribly - otherwise don't be afraid of my data. I took this as a chance to verify my data recovery procedure, with success.
Christoph
On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
That sucks. Can you test Linus's tree to verify the problem is there? I'll gladly revert this if Linus's tree also gets the revert, I don't want you to hit this when you upgrade to a newer kernel.
Confirmed: The problem is, err ... was in Linus' tree as well. The rather recent commit 8f5fd927c3a7 reverted the change, after that everything is as expected again.
Thanks for checking.
Looking at the original commit, I don't have a clue why things go wrong so horribly
It's a half endianness conversion. The plain in-memory structures are in LE and has to be accessed via the helpers, super block copy and the root item. The commit adds only one half of the conversion, that naturally does not exhibit on LE, because the macros are no-op.
Originally, the items were stored from the on-disk type to on-disk type, regardless of the CPU:
super->chunk_root = root_item->bytenr;
The patch should have added conversion of both values, like
btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item));
and not
btrfs_set_super_chunk_root(super, root_item->bytenr);
It's not very obvious from the context though, typically there's one structure that needs the accessors and is set from a value that's in CPU byteorder. I think that the exception to this pattern confused all involved developers.
The root_item members are annotated as __le64 that should be caught by sparse/smatch checker in the buggy case, but we don't run the checkers every the time.
- otherwise don't be afraid of my data. I took this as a
chance to verify my data recovery procedure, with success.
Should that not be the case, the damaged items in superblock can be byteswapped back. That's 6 x u64 at most, I have a tool for that now.
Thanks again for the report and sorry for the trouble.
On 03/20/2018 03:32 AM, David Sterba wrote:
On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
That sucks. Can you test Linus's tree to verify the problem is there? I'll gladly revert this if Linus's tree also gets the revert, I don't want you to hit this when you upgrade to a newer kernel.
Confirmed: The problem is, err ... was in Linus' tree as well. The rather recent commit 8f5fd927c3a7 reverted the change, after that everything is as expected again.
Thanks for checking.
Looking at the original commit, I don't have a clue why things go wrong so horribly
It's a half endianness conversion. The plain in-memory structures are in LE and has to be accessed via the helpers, super block copy and the root item. The commit adds only one half of the conversion, that naturally does not exhibit on LE, because the macros are no-op.
Originally, the items were stored from the on-disk type to on-disk type, regardless of the CPU:
super->chunk_root = root_item->bytenr;
The patch should have added conversion of both values, like
btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item));
and not
btrfs_set_super_chunk_root(super, root_item->bytenr);
It's not very obvious from the context though, typically there's one structure that needs the accessors and is set from a value that's in CPU byteorder. I think that the exception to this pattern confused all involved developers.
The root_item members are annotated as __le64 that should be caught by sparse/smatch checker in the buggy case, but we don't run the checkers every the time.
Ah! the RC is at the other side of the equation. Makes sense to me. Thanks.
- otherwise don't be afraid of my data. I took this as a
chance to verify my data recovery procedure, with success.
Should that not be the case, the damaged items in superblock can be byteswapped back. That's 6 x u64 at most, I have a tool for that now.
Thanks again for the report and sorry for the trouble.
It's entirely my mistake. My apologies for the inconvenience.
Thanks, Anand
-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/16/2018 02:55 AM, Christoph Biedl wrote:
Greg Kroah-Hartman wrote...
4.14-stable review patch. If anyone has any objections, please let me know.
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
(...)
If the filesystem is always used on a same endian host, this will not be a problem.
From my observations I cannot quite subscribe to that.
On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device.
Thanks for the report.
That's really bad, my mistake. I am digging to know how it happened. Our on-disk root bytenr are little-endian compatible. So using the cpu_to_le for write on a big-endian arch is a correct thing to do. If it fails, certainly there is something which I have overlooked. I am digging to know. Thanks for the report again.
Fsck won't be able to figure out the correct on-disk btyenr either.
If there isn't any backup we could try to find out the correct pointers manually. However, restore from the backup approach is much better.
-Anand
Steps to reproduce (tested on ppc/powerpc and parisc/hppa):
# mkfs.btrfs $DEV # mount $DEV /mnt/tmp/ # umount /mnt/tmp/
This simple umount corrupts the file system:
# mount $DEV /mnt/tmp/ mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing codepage or helper program, or other error.
# dmesg: BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 4294967296 length 4096 BTRFS critical (device <dev>): unable to find logical 18102363734671360 length 16384 BTRFS error (device <dev>): failed to read chunk root BTRFS error (device <dev>): open_ctree failed
Also fsck is of no help:
# btrfsck $DEV Couldn't map the block 18102363734671360 No mapping for 18102363734671360-18102363734687744 Couldn't map the block 18102363734671360 bytenr mismatch, want=18102363734671360, have=0 ERROR: cannot read chunk root ERROR: cannot open file system
Trying mount or fsck on a little-endian system does not help either. So I consider the data on that device lost - luckily I use btrfs only for files where a backup exists all the time.
Reverting that change restored the previous error-free behaviour. I didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last that affected these files. Still I could give this a try if anybody wishes so.
Cheers,
Christoph
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiufei Xue jiufei.xue@linux.alibaba.com
commit 7c5a0dcf557c6511a61e092ba887de28882fe857 upstream.
The vm counters is counted in sectors, so we should do the conversation in submit_bio.
Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") Cc: stable@vger.kernel.org Reviewed-by: Omar Sandoval osandov@fb.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jiufei Xue jiufei.xue@linux.alibaba.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/block/blk-core.c +++ b/block/blk-core.c @@ -2277,7 +2277,7 @@ blk_qc_t submit_bio(struct bio *bio) unsigned int count;
if (unlikely(bio_op(bio) == REQ_OP_WRITE_SAME)) - count = queue_logical_block_size(bio->bi_disk->queue); + count = queue_logical_block_size(bio->bi_disk->queue) >> 9; else count = bio_sectors(bio);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ming Lei ming.lei@redhat.com
commit ba989a01469d027861e55c8f1121edadef757797 upstream.
When requeuing request, the domain token should have been freed before re-inserting the request to io scheduler. Otherwise, the assigned domain token will be leaked, and IO hang can be caused.
Cc: Paolo Valente paolo.valente@linaro.org Cc: Omar Sandoval osandov@fb.com Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche bart.vanassche@wdc.com Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/kyber-iosched.c | 1 + 1 file changed, 1 insertion(+)
--- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -814,6 +814,7 @@ static struct elevator_type kyber_sched .limit_depth = kyber_limit_depth, .prepare_request = kyber_prepare_request, .finish_request = kyber_finish_request, + .requeue_request = kyber_finish_request, .completed_request = kyber_completed_request, .dispatch_request = kyber_dispatch_request, .has_work = kyber_has_work,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit 94db151dc89262bfa82922c44e8320cea2334667 upstream.
Filesystem-DAX is incompatible with 'longterm' page pinning. Without page cache indirection a DAX mapping maps filesystem blocks directly. This means that the filesystem must not modify a file's block map while any page in a mapping is pinned. In order to prevent the situation of userspace holding of filesystem operations indefinitely, disallow 'longterm' Filesystem-DAX mappings.
RDMA has the same conflict and the plan there is to add a 'with lease' mechanism to allow the kernel to notify userspace that the mapping is being torn down for block-map maintenance. Perhaps something similar can be put in place for vfio.
Note that xfs and ext4 still report:
"DAX enabled. Warning: EXPERIMENTAL, use at your own risk"
...at mount time, and resolving the dax-dma-vs-truncate problem is one of the last hurdles to remove that designation.
Acked-by: Alex Williamson alex.williamson@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Reported-by: Haozhong Zhang haozhong.zhang@intel.com Tested-by: Haozhong Zhang haozhong.zhang@intel.com Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
--- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struc { struct page *page[1]; struct vm_area_struct *vma; + struct vm_area_struct *vmas[1]; int ret;
if (mm == current->mm) { - ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), - page); + ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), + page, vmas); } else { unsigned int flags = 0;
@@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struc
down_read(&mm->mmap_sem); ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - NULL, NULL); + vmas, NULL); + /* + * The lifetime of a vaddr_get_pfn() page pin is + * userspace-controlled. In the fs-dax case this could + * lead to indefinite stalls in filesystem operations. + * Disallow attempts to pin fs-dax pages via this + * interface. + */ + if (ret > 0 && vma_is_fsdax(vmas[0])) { + ret = -EOPNOTSUPP; + put_page(page[0]); + } up_read(&mm->mmap_sem); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar viresh.kumar@linaro.org
commit 0373ca74831b0f93cd4cdbf7ad3aec3c33a479a5 upstream.
commit a307a1e6bc0d "cpufreq: s3c: use cpufreq_generic_init()" accidentally broke cpufreq on s3c2410 and s3c2412.
These two platforms don't have a CPU frequency table and used to skip calling cpufreq_table_validate_and_show() for them. But with the above commit, we started calling it unconditionally and that will eventually fail as the frequency table pointer is NULL.
Fix this by calling cpufreq_table_validate_and_show() conditionally again.
Fixes: a307a1e6bc0d "cpufreq: s3c: use cpufreq_generic_init()" Cc: 3.13+ stable@vger.kernel.org # v3.13+ Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/cpufreq/s3c24xx-cpufreq.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/cpufreq/s3c24xx-cpufreq.c +++ b/drivers/cpufreq/s3c24xx-cpufreq.c @@ -351,7 +351,13 @@ struct clk *s3c_cpufreq_clk_get(struct d static int s3c_cpufreq_init(struct cpufreq_policy *policy) { policy->clk = clk_arm; - return cpufreq_generic_init(policy, ftab, cpu_cur.info->latency); + + policy->cpuinfo.transition_latency = cpu_cur.info->latency; + + if (ftab) + return cpufreq_table_validate_and_show(policy, ftab); + + return 0; }
static int __init s3c_cpufreq_initclks(void)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit 230f5a8969d8345fc9bbe3683f068246cf1be4b8 upstream.
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on device-dax instances when those are meant to be explicitly allowed.
Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm") Cc: stable@vger.kernel.org Reported-by: Gerd Rausch gerd.rausch@oracle.com Acked-by: Jane Chu jane.chu@oracle.com Reported-by: Haozhong Zhang haozhong.zhang@intel.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3185,7 +3185,7 @@ static inline bool vma_is_fsdax(struct v if (!vma_is_dax(vma)) return false; inode = file_inode(vma->vm_file); - if (inode->i_mode == S_IFCHR) + if (S_ISCHR(inode->i_mode)) return false; /* device-dax */ return true; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
commit d9c10e5b8863cfb6886d1640386455075c6e979d upstream.
Commit e864f39569f4 "fs: add RWF_DSYNC aand RWF_SYNC" added additional way for direct IO to become synchronous and thus trigger fsync from the IO completion handler. Then commit 9830f4be159b "fs: Use RWF_* flags for AIO operations" allowed these flags to be set for AIO as well. However that commit forgot to update the condition checking whether the IO completion handling should be defered to a workqueue and thus AIO DIO with RWF_[D]SYNC set will call fsync() from IRQ context resulting in sleep in atomic.
Fix the problem by checking directly iocb flags (the same way as it is done in dio_complete()) instead of checking all conditions that could lead to IO being synchronous.
CC: Christoph Hellwig hch@lst.de CC: Goldwyn Rodrigues rgoldwyn@suse.com CC: stable@vger.kernel.org Reported-by: Mark Rutland mark.rutland@arm.com Tested-by: Mark Rutland mark.rutland@arm.com Fixes: 9830f4be159b29399d107bffb99e0132bc5aedd4 Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/direct-io.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -1252,8 +1252,7 @@ do_blockdev_direct_IO(struct kiocb *iocb */ if (dio->is_async && iov_iter_rw(iter) == WRITE) { retval = 0; - if ((iocb->ki_filp->f_flags & O_DSYNC) || - IS_SYNC(iocb->ki_filp->f_mapping->host)) + if (iocb->ki_flags & IOCB_DSYNC) retval = dio_set_defer_completion(dio); else if (!dio->inode->i_sb->s_dio_done_wq) { /*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit 71c208dd54ab971036d83ff6d9837bae4976e623 upstream.
Older Xen versions (4.5 and before) might have problems migrating pv guests with MSR_IA32_SPEC_CTRL having a non-zero value. So before suspending zero that MSR and restore it after being resumed.
Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Link: https://lkml.kernel.org/r/20180226140818.4849-1-jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/xen/suspend.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -1,12 +1,15 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/types.h> #include <linux/tick.h> +#include <linux/percpu-defs.h>
#include <xen/xen.h> #include <xen/interface/xen.h> #include <xen/grant_table.h> #include <xen/events.h>
+#include <asm/cpufeatures.h> +#include <asm/msr-index.h> #include <asm/xen/hypercall.h> #include <asm/xen/page.h> #include <asm/fixmap.h> @@ -15,6 +18,8 @@ #include "mmu.h" #include "pmu.h"
+static DEFINE_PER_CPU(u64, spec_ctrl); + void xen_arch_pre_suspend(void) { if (xen_pv_domain()) @@ -31,6 +36,9 @@ void xen_arch_post_suspend(int cancelled
static void xen_vcpu_notify_restore(void *data) { + if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) + wrmsrl(MSR_IA32_SPEC_CTRL, this_cpu_read(spec_ctrl)); + /* Boot processor notified via generic timekeeping_resume() */ if (smp_processor_id() == 0) return; @@ -40,7 +48,15 @@ static void xen_vcpu_notify_restore(void
static void xen_vcpu_notify_suspend(void *data) { + u64 tmp; + tick_suspend_local(); + + if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) { + rdmsrl(MSR_IA32_SPEC_CTRL, tmp); + this_cpu_write(spec_ctrl, tmp); + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + } }
void xen_arch_resume(void)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Panceac sebastian@resin.io
commit 028091f82eefd5e84f81cef81a7673016ecbe78b upstream.
When the Intel Edison module is powered with 3.3V, the reboot command makes the module stuck. If the module is powered at a greater voltage, like 4.4V (as the Edison Mini Breakout board does), reboot works OK.
The official Intel Edison BSP sends the IPCMSG_COLD_RESET message to the SCU by default. The IPCMSG_COLD_BOOT which is used by the upstream kernel is only sent when explicitely selected on the kernel command line.
Use IPCMSG_COLD_RESET unconditionally which makes reboot work independent of the power supply voltage.
[ tglx: Massaged changelog ]
Fixes: bda7b072de99 ("x86/platform/intel-mid: Implement power off sequence") Signed-off-by: Sebastian Panceac sebastian@resin.io Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Andy Shevchenko andy.shevchenko@gmail.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1519810849-15131-1-git-send-email-sebastian@resin.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/platform/intel-mid/intel-mid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/platform/intel-mid/intel-mid.c +++ b/arch/x86/platform/intel-mid/intel-mid.c @@ -79,7 +79,7 @@ static void intel_mid_power_off(void)
static void intel_mid_reboot(void) { - intel_scu_ipc_simple_command(IPCMSG_COLD_BOOT, 0); + intel_scu_ipc_simple_command(IPCMSG_COLD_RESET, 0); }
static unsigned long __init intel_mid_calibrate_tsc(void)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 945fd17ab6bab8a4d05da6c3170519fbcfe62ddb upstream.
The separation of the cpu_entry_area from the fixmap missed the fact that on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in initial_page_table by the previous synchronizations.
This results in suspend/resume failures because 32bit utilizes initial page table for resume. The absence of the cpu_entry_area mapping results in a triple fault, aka. insta reboot.
With PAE enabled this works by chance because the PGD entry which covers the fixmap and other parts incindentally provides the cpu_entry_area mapping as well.
Synchronize the initial page table after setting up the cpu entry area. Instead of adding yet another copy of the same code, move it to a function and invoke it from the various places.
It needs to be investigated if the existing calls in setup_arch() and setup_per_cpu_areas() can be replaced by the later invocation from setup_cpu_entry_areas(), but that's beyond the scope of this fix.
Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Reported-by: Woody Suwalski terraluna977@gmail.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Woody Suwalski terraluna977@gmail.com Cc: William Grant william.grant@canonical.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linut... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/pgtable_32.h | 1 + arch/x86/include/asm/pgtable_64.h | 1 + arch/x86/kernel/setup.c | 17 +++++------------ arch/x86/kernel/setup_percpu.c | 17 ++++------------- arch/x86/mm/cpu_entry_area.c | 6 ++++++ arch/x86/mm/init_32.c | 15 +++++++++++++++ 6 files changed, 32 insertions(+), 25 deletions(-)
--- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -32,6 +32,7 @@ extern pmd_t initial_pg_pmd[]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } void paging_init(void); +void sync_initial_page_table(void);
/* * Define this if things work differently on an i386 and an i486: --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -28,6 +28,7 @@ extern pgd_t init_top_pgt[]; #define swapper_pg_dir init_top_pgt
extern void paging_init(void); +static inline void sync_initial_page_table(void) { }
#define pte_ERROR(e) \ pr_err("%s:%d: bad pte %p(%016lx)\n", \ --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1238,20 +1238,13 @@ void __init setup_arch(char **cmdline_p)
kasan_init();
-#ifdef CONFIG_X86_32 - /* sync back kernel address range */ - clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - KERNEL_PGD_PTRS); - /* - * sync back low identity map too. It is used for example - * in the 32-bit EFI stub. + * Sync back kernel address range. + * + * FIXME: Can the later sync in setup_cpu_entry_areas() replace + * this call? */ - clone_pgd_range(initial_page_table, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); -#endif + sync_initial_page_table();
tboot_probe();
--- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -287,24 +287,15 @@ void __init setup_per_cpu_areas(void) /* Setup cpu initialized, callin, callout masks */ setup_cpu_local_masks();
-#ifdef CONFIG_X86_32 /* * Sync back kernel address range again. We already did this in * setup_arch(), but percpu data also needs to be available in * the smpboot asm. We can't reliably pick up percpu mappings * using vmalloc_fault(), because exception dispatch needs * percpu data. + * + * FIXME: Can the later sync in setup_cpu_entry_areas() replace + * this call? */ - clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - KERNEL_PGD_PTRS); - - /* - * sync back low identity map too. It is used for example - * in the 32-bit EFI stub. - */ - clone_pgd_range(initial_page_table, - swapper_pg_dir + KERNEL_PGD_BOUNDARY, - min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); -#endif + sync_initial_page_table(); } --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -163,4 +163,10 @@ void __init setup_cpu_entry_areas(void)
for_each_possible_cpu(cpu) setup_cpu_entry_area(cpu); + + /* + * This is the last essential update to swapper_pgdir which needs + * to be synchronized to initial_page_table on 32bit. + */ + sync_initial_page_table(); } --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -453,6 +453,21 @@ static inline void permanent_kmaps_init( } #endif /* CONFIG_HIGHMEM */
+void __init sync_initial_page_table(void) +{ + clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); + + /* + * sync back low identity map too. It is used for example + * in the 32-bit EFI stub. + */ + clone_pgd_range(initial_page_table, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); +} + void __init native_pagetable_init(void) { unsigned long pfn, va;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 1b12580af1d0677c3c3a19e35bfe5d59b03f737f ]
Now br_sysfs_if file flush doesn't have attr show. To read it will cause kernel panic after users chmod u+r this file.
Xiong found this issue when running the commands:
ip link add br0 type bridge ip link add type veth ip link set veth0 master br0 chmod u+r /sys/devices/virtual/net/veth0/brport/flush timeout 3 cat /sys/devices/virtual/net/veth0/brport/flush
kernel crashed with NULL a pointer dereference call trace.
This patch is to fix it by return -EINVAL when brport_attr->show is null, just the same as the check for brport_attr->store in brport_store().
Fixes: 9cf637473c85 ("bridge: add sysfs hook to flush forwarding table") Reported-by: Xiong Zhou xzhou@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/br_sysfs_if.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/bridge/br_sysfs_if.c +++ b/net/bridge/br_sysfs_if.c @@ -235,6 +235,9 @@ static ssize_t brport_show(struct kobjec struct brport_attribute *brport_attr = to_brport_attr(attr); struct net_bridge_port *p = to_brport(kobj);
+ if (!brport_attr->show) + return -EINVAL; + return brport_attr->show(p, buf); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Brivio sbrivio@redhat.com
[ Upstream commit a8c6db1dfd1b1d18359241372bb204054f2c3174 ]
In fib_nh_match(), if output interface or gateway are passed in the FIB configuration, we don't have to check next hops of multipath routes to conclude whether we have a match or not.
However, we might still have routes with different realms matching the same output interface and gateway configuration, and this needs to cause the match to fail. Otherwise the first route inserted in the FIB will match, regardless of the realms:
# ip route add 1.1.1.1 dev eth0 table 1234 realms 1/2 # ip route append 1.1.1.1 dev eth0 table 1234 realms 3/4 # ip route list table 1234 1.1.1.1 dev eth0 scope link realms 1/2 1.1.1.1 dev eth0 scope link realms 3/4 # ip route del 1.1.1.1 dev ens3 table 1234 realms 3/4 # ip route list table 1234 1.1.1.1 dev ens3 scope link realms 3/4
whereas route with realms 3/4 should have been deleted instead.
Explicitly check for fc_flow passed in the FIB configuration (this comes from RTA_FLOW extracted by rtm_to_fib_config()) and fail matching if it differs from nh_tclassid.
The handling of RTA_FLOW for multipath routes later in fib_nh_match() is still needed, as we can have multiple RTA_FLOW attributes that need to be matched against the tclassid of each next hop.
v2: Check that fc_flow is set before discarding the match, so that the user can still select the first matching rule by not specifying any realm, as suggested by David Ahern.
Reported-by: Jianlin Shi jishi@redhat.com Signed-off-by: Stefano Brivio sbrivio@redhat.com Acked-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/fib_semantics.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -654,6 +654,11 @@ int fib_nh_match(struct fib_config *cfg, fi->fib_nh, cfg, extack)) return 1; } +#ifdef CONFIG_IP_ROUTE_CLASSID + if (cfg->fc_flow && + cfg->fc_flow != fi->fib_nh->nh_tclassid) + return 1; +#endif if ((!cfg->fc_oif || cfg->fc_oif == fi->fib_nh->nh_oif) && (!cfg->fc_gw || cfg->fc_gw == fi->fib_nh->nh_gw)) return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Denis Du dudenis2000@yahoo.ca
[ Upstream commit b6c3bad1ba83af1062a7ff6986d9edc4f3d7fc8e ]
Sometimes when physical lines have a just good noise to make the protocol handshaking fail, but the carrier detect still good. Then after remove of the noise, nobody will trigger this protocol to be start again to cause the link to never come back. The fix is when the carrier is still on, not terminate the protocol handshaking.
Signed-off-by: Denis Du dudenis2000@yahoo.ca Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wan/hdlc_ppp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/wan/hdlc_ppp.c +++ b/drivers/net/wan/hdlc_ppp.c @@ -574,7 +574,10 @@ static void ppp_timer(unsigned long arg) ppp_cp_event(proto->dev, proto->pid, TO_GOOD, 0, 0, 0, NULL); proto->restart_counter--; - } else + } else if (netif_carrier_ok(proto->dev)) + ppp_cp_event(proto->dev, proto->pid, TO_GOOD, 0, 0, + 0, NULL); + else ppp_cp_event(proto->dev, proto->pid, TO_BAD, 0, 0, 0, NULL); break;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit ca79bec237f5809a7c3c59bd41cd0880aa889966 ]
gcc-8 has a new warning that detects overlapping input and output arguments in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(), which is actually correct:
net/ipv6/sit.c: In function 'sit_init_net': net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict]
The problem here is that the logic detecting the memcpy() arguments finds them to be the same, but the conditional that tests for the input and output of ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant.
We know that netdev_priv(t->dev) is the same as t for a tunnel device, and comparing "dev" directly here lets the compiler figure out as well that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so it no longer warns.
This code is old, so Cc stable to make sure that we don't get the warning for older kernels built with new gcc.
Cc: Martin Sebor msebor@gmail.com Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456 Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/sit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -176,7 +176,7 @@ static void ipip6_tunnel_clone_6rd(struc #ifdef CONFIG_IPV6_SIT_6RD struct ip_tunnel *t = netdev_priv(dev);
- if (t->dev == sitn->fb_tunnel_dev) { + if (dev == sitn->fb_tunnel_dev) { ipv6_addr_set(&t->ip6rd.prefix, htonl(0x20020000), 0, 0, 0); t->ip6rd.relay_prefix = 0; t->ip6rd.prefixlen = 16;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit a3276892db7a588bedc33168e502572008f714a9 ]
Due to a typo, the mask was destroyed by a comparison instead of a bit shift.
Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Acked-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c @@ -595,7 +595,7 @@ isr_done:
reissue_mask = 1 << 0; if (!pdata->per_channel_irq) - reissue_mask |= 0xffff < 4; + reissue_mask |= 0xffff << 4;
XP_IOWRITE(pdata, XP_INT_REISSUE_EN, reissue_mask); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Grygorii Strashko grygorii.strashko@ti.com
[ Upstream commit 62f94c2101f35cd45775df00ba09bde77580e26a ]
It was discovered that simple program which indefinitely sends 200b UDP packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog timeout is triggered due to race between cpsw_ndo_start_xmit() and cpsw_tx_handler() [NAPI]
cpsw_ndo_start_xmit() if (unlikely(!cpdma_check_free_tx_desc(txch))) { txq = netdev_get_tx_queue(ndev, q_idx); netif_tx_stop_queue(txq);
^^ as per [1] barier has to be used after set_bit() otherwise new value might not be visible to other cpus }
cpsw_tx_handler() if (unlikely(netif_tx_queue_stopped(txq))) netif_tx_wake_queue(txq);
and when it happens ndev TX queue became disabled forever while driver's HW TX queue is empty.
Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue() calls and double check for free TX descriptors after stopping ndev TX queue - if there are free TX descriptors wake up ndev TX queue.
[1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html Signed-off-by: Grygorii Strashko grygorii.strashko@ti.com Reviewed-by: Ivan Khoronzhuk ivan.khoronzhuk@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1618,6 +1618,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(s q_idx = q_idx % cpsw->tx_ch_num;
txch = cpsw->txv[q_idx].ch; + txq = netdev_get_tx_queue(ndev, q_idx); ret = cpsw_tx_packet_submit(priv, skb, txch); if (unlikely(ret != 0)) { cpsw_err(priv, tx_err, "desc submit failed\n"); @@ -1628,15 +1629,26 @@ static netdev_tx_t cpsw_ndo_start_xmit(s * tell the kernel to stop sending us tx frames. */ if (unlikely(!cpdma_check_free_tx_desc(txch))) { - txq = netdev_get_tx_queue(ndev, q_idx); netif_tx_stop_queue(txq); + + /* Barrier, so that stop_queue visible to other cpus */ + smp_mb__after_atomic(); + + if (cpdma_check_free_tx_desc(txch)) + netif_tx_wake_queue(txq); }
return NETDEV_TX_OK; fail: ndev->stats.tx_dropped++; - txq = netdev_get_tx_queue(ndev, skb_get_queue_mapping(skb)); netif_tx_stop_queue(txq); + + /* Barrier, so that stop_queue visible to other cpus */ + smp_mb__after_atomic(); + + if (cpdma_check_free_tx_desc(txch)) + netif_tx_wake_queue(txq); + return NETDEV_TX_BUSY; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski jakub.kicinski@netronome.com
[ Upstream commit ac5b70198adc25c73fba28de4f78adcee8f6be0b ]
netif_set_real_num_tx_queues() can be called when netdev is up. That usually happens when user requests change of number of channels/rings with ethtool -L. The procedure for changing the number of queues involves resetting the qdiscs and setting dev->num_tx_queues to the new value. When the new value is lower than the old one, extra care has to be taken to ensure ordering of accesses to the number of queues vs qdisc reset.
Currently the queues are reset before new dev->num_tx_queues is assigned, leaving a window of time where packets can be enqueued onto the queues going down, leading to a likely crash in the drivers, since most drivers don't check if TX skbs are assigned to an active queue.
Fixes: e6484930d7c7 ("net: allocate tx queues in register_netdevice") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -2343,8 +2343,11 @@ EXPORT_SYMBOL(netdev_set_num_tc); */ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq) { + bool disabling; int rc;
+ disabling = txq < dev->real_num_tx_queues; + if (txq < 1 || txq > dev->num_tx_queues) return -EINVAL;
@@ -2360,15 +2363,19 @@ int netif_set_real_num_tx_queues(struct if (dev->num_tc) netif_setup_tc(dev, txq);
- if (txq < dev->real_num_tx_queues) { + dev->real_num_tx_queues = txq; + + if (disabling) { + synchronize_net(); qdisc_reset_all_tx_gt(dev, txq); #ifdef CONFIG_XPS netif_reset_xps_queues_gt(dev, txq); #endif } + } else { + dev->real_num_tx_queues = txq; }
- dev->real_num_tx_queues = txq; return 0; } EXPORT_SYMBOL(netif_set_real_num_tx_queues);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit c7272c2f1229125f74f22dcdd59de9bbd804f1c8 ]
According to RFC 1191 sections 3 and 4, ICMP frag-needed messages indicating an MTU below 68 should be rejected:
A host MUST never reduce its estimate of the Path MTU below 68 octets.
and (talking about ICMP frag-needed's Next-Hop MTU field):
This field will never contain a value less than 68, since every router "must be able to forward a datagram of 68 octets without fragmentation".
Furthermore, by letting net.ipv4.route.min_pmtu be set to negative values, we can end up with a very large PMTU when (-1) is cast into u32.
Let's also make ip_rt_min_pmtu a u32, since it's only ever compared to unsigned ints.
Reported-by: Jianlin Shi jishi@redhat.com Signed-off-by: Sabrina Dubroca sd@queasysnail.net Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/route.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -128,10 +128,13 @@ static int ip_rt_redirect_silence __read static int ip_rt_error_cost __read_mostly = HZ; static int ip_rt_error_burst __read_mostly = 5 * HZ; static int ip_rt_mtu_expires __read_mostly = 10 * 60 * HZ; -static int ip_rt_min_pmtu __read_mostly = 512 + 20 + 20; +static u32 ip_rt_min_pmtu __read_mostly = 512 + 20 + 20; static int ip_rt_min_advmss __read_mostly = 256;
static int ip_rt_gc_timeout __read_mostly = RT_GC_TIMEOUT; + +static int ip_min_valid_pmtu __read_mostly = IPV4_MIN_MTU; + /* * Interface to generic destination cache. */ @@ -2934,7 +2937,8 @@ static struct ctl_table ipv4_route_table .data = &ip_rt_min_pmtu, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dointvec_minmax, + .extra1 = &ip_min_valid_pmtu, }, { .procname = "min_adv_mss",
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolas Dichtel nicolas.dichtel@6wind.com
[ Upstream commit cb9f7a9a5c96a773bbc9c70660dc600cfff82f82 ]
Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the case when commit 134e63756d5f was pushed. However, there was no reason to stop the loop if a netns does not have listeners. Returns -ESRCH only if there was no listeners in all netns.
To avoid having the same problem in the future, I didn't take the assumption that nlmsg_multicast() returns only 0 or -ESRCH.
Fixes: 134e63756d5f ("genetlink: make netns aware") CC: Johannes Berg johannes.berg@intel.com Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netlink/genetlink.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -1081,6 +1081,7 @@ static int genlmsg_mcast(struct sk_buff { struct sk_buff *tmp; struct net *net, *prev = NULL; + bool delivered = false; int err;
for_each_net_rcu(net) { @@ -1092,14 +1093,21 @@ static int genlmsg_mcast(struct sk_buff } err = nlmsg_multicast(prev->genl_sock, tmp, portid, group, flags); - if (err) + if (!err) + delivered = true; + else if (err != -ESRCH) goto error; }
prev = net; }
- return nlmsg_multicast(prev->genl_sock, skb, portid, group, flags); + err = nlmsg_multicast(prev->genl_sock, skb, portid, group, flags); + if (!err) + delivered = true; + else if (err != -ESRCH) + goto error; + return delivered ? 0 : -ESRCH; error: kfree_skb(skb); return err;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roman Kapl code@rkapl.cz
[ Upstream commit 5ae437ad5a2ed573b1ebb04e0afa70b8869f88dd ]
So far, if the filter was too large to fit in the allocated skb, the kernel did not return any error and stopped dumping. Modify the dumper so that it returns -EMSGSIZE when a filter fails to dump and it is the first filter in the skb. If we are not first, we will get a next chance with more room.
I understand this is pretty near to being an API change, but the original design (silent truncation) can be considered a bug.
Note: The error case can happen pretty easily if you create a filter with 32 actions and have 4kb pages. Also recent versions of iproute try to be clever with their buffer allocation size, which in turn leads to
Signed-off-by: Roman Kapl code@rkapl.cz Acked-by: Jiri Pirko jiri@mellanox.com Acked-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/cls_api.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -871,13 +871,18 @@ static int tc_dump_tfilter(struct sk_buf if (tca[TCA_CHAIN] && nla_get_u32(tca[TCA_CHAIN]) != chain->index) continue; - if (!tcf_chain_dump(chain, skb, cb, index_start, &index)) + if (!tcf_chain_dump(chain, skb, cb, index_start, &index)) { + err = -EMSGSIZE; break; + } }
cb->args[0] = index;
out: + /* If we did no progress, the error (EMSGSIZE) is real */ + if (skb->len == 0 && err) + return err; return skb->len; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit 77f840e3e5f09c6d7d727e85e6e08276dd813d11 ]
PPP units don't hold any reference on the channels connected to it. It is the channel's responsibility to ensure that it disconnects from its unit before being destroyed. In practice, this is ensured by ppp_unregister_channel() disconnecting the channel from the unit before dropping a reference on the channel.
However, it is possible for an unregistered channel to connect to a PPP unit: register a channel with ppp_register_net_channel(), attach a /dev/ppp file to it with ioctl(PPPIOCATTCHAN), unregister the channel with ppp_unregister_channel() and finally connect the /dev/ppp file to a PPP unit with ioctl(PPPIOCCONNECT).
Once in this situation, the channel is only held by the /dev/ppp file, which can be released at anytime and free the channel without letting the parent PPP unit know. Then the ppp structure ends up with dangling pointers in its ->channels list.
Prevent this scenario by forbidding unregistered channels from connecting to PPP units. This maintains the code logic by keeping ppp_unregister_channel() responsible from disconnecting the channel if necessary and avoids modification on the reference counting mechanism.
This issue seems to predate git history (successfully reproduced on Linux 2.6.26 and earlier PPP commits are unrelated).
Signed-off-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/ppp_generic.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -3158,6 +3158,15 @@ ppp_connect_channel(struct channel *pch, goto outl;
ppp_lock(ppp); + spin_lock_bh(&pch->downl); + if (!pch->chan) { + /* Don't connect unregistered channels */ + spin_unlock_bh(&pch->downl); + ppp_unlock(ppp); + ret = -ENOTCONN; + goto outl; + } + spin_unlock_bh(&pch->downl); if (pch->file.hdrlen > ppp->file.hdrlen) ppp->file.hdrlen = pch->file.hdrlen; hdrlen = pch->file.hdrlen + 2; /* for protocol bytes */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kodanev alexey.kodanev@oracle.com
[ Upstream commit 07f2c7ab6f8d0a7e7c5764c4e6cc9c52951b9d9c ]
When SCTP makes INIT or INIT_ACK packet the total chunk length can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when transmitting these packets, e.g. the crash on sending INIT_ACK:
[ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168 put:120156 head:000000007aa47635 data:00000000d991c2de tail:0x1d640 end:0xfec0 dev:<NULL> ... [ 597.976970] ------------[ cut here ]------------ [ 598.033408] kernel BUG at net/core/skbuff.c:104! [ 600.314841] Call Trace: [ 600.345829] <IRQ> [ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.436934] skb_put+0x16c/0x200 [ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp] [ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp] [ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp] [ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp] [ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp] [ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp] [ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp] [ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp] [ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp] [ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp] [ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp] [ 601.233575] sctp_do_sm+0x182/0x560 [sctp] [ 601.284328] ? sctp_has_association+0x70/0x70 [sctp] [ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp] [ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp] ...
Here the chunk size for INIT_ACK packet becomes too big, mostly because of the state cookie (INIT packet has large size with many address parameters), plus additional server parameters.
Later this chunk causes the panic in skb_put_data():
skb_packet_transmit() sctp_packet_pack() skb_put_data(nskb, chunk->skb->data, chunk->skb->len);
'nskb' (head skb) was previously allocated with packet->size from u16 'chunk->chunk_hdr->length'.
As suggested by Marcelo we should check the chunk's length in _sctp_make_chunk() before trying to allocate skb for it and discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN.
Signed-off-by: Alexey Kodanev alexey.kodanev@oracle.com Acked-by: Marcelo Ricardo Leitner marcelo.leinter@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/sm_make_chunk.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1378,9 +1378,14 @@ static struct sctp_chunk *_sctp_make_chu struct sctp_chunk *retval; struct sk_buff *skb; struct sock *sk; + int chunklen; + + chunklen = SCTP_PAD4(sizeof(*chunk_hdr) + paylen); + if (chunklen > SCTP_MAX_CHUNK_LEN) + goto nodata;
/* No need to allocate LL here, as this is only a chunk. */ - skb = alloc_skb(SCTP_PAD4(sizeof(*chunk_hdr) + paylen), gfp); + skb = alloc_skb(chunklen, gfp); if (!skb) goto nodata;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kodanev alexey.kodanev@oracle.com
[ Upstream commit 15f35d49c93f4fa9875235e7bf3e3783d2dd7a1b ]
Since UDP-Lite is always using checksum, the following path is triggered when calculating pseudo header for it:
udp4_csum_init() or udp6_csum_init() skb_checksum_init_zero_check() __skb_checksum_validate_complete()
The problem can appear if skb->len is less than CHECKSUM_BREAK. In this particular case __skb_checksum_validate_complete() also invokes __skb_checksum_complete(skb). If UDP-Lite is using partial checksum that covers only part of a packet, the function will return bad checksum and the packet will be dropped.
It can be fixed if we skip skb_checksum_init_zero_check() and only set the required pseudo header checksum for UDP-Lite with partial checksum before udp4_csum_init()/udp6_csum_init() functions return.
Fixes: ed70fcfcee95 ("net: Call skb_checksum_init in IPv4") Fixes: e4f45b7f40bd ("net: Call skb_checksum_init in IPv6") Signed-off-by: Alexey Kodanev alexey.kodanev@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/udplite.h | 1 + net/ipv4/udp.c | 5 +++++ net/ipv6/ip6_checksum.c | 5 +++++ 3 files changed, 11 insertions(+)
--- a/include/net/udplite.h +++ b/include/net/udplite.h @@ -64,6 +64,7 @@ static inline int udplite_checksum_init( UDP_SKB_CB(skb)->cscov = cscov; if (skb->ip_summed == CHECKSUM_COMPLETE) skb->ip_summed = CHECKSUM_NONE; + skb->csum_valid = 0; }
return 0; --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2032,6 +2032,11 @@ static inline int udp4_csum_init(struct err = udplite_checksum_init(skb, uh); if (err) return err; + + if (UDP_SKB_CB(skb)->partial_cov) { + skb->csum = inet_compute_pseudo(skb, proto); + return 0; + } }
/* Note, we are only interested in != 0 or == 0, thus the --- a/net/ipv6/ip6_checksum.c +++ b/net/ipv6/ip6_checksum.c @@ -73,6 +73,11 @@ int udp6_csum_init(struct sk_buff *skb, err = udplite_checksum_init(skb, uh); if (err) return err; + + if (UDP_SKB_CB(skb)->partial_cov) { + skb->csum = ip6_compute_pseudo(skb, proto); + return 0; + } }
/* To support RFC 6936 (allow zero checksum in UDP/IPV6 for tunnels)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gal Pressman galp@mellanox.com
[ Upstream commit 8babd44d2079079f9d5a4aca7005aed80236efe0 ]
When receiving an LRO packet, the checksum field is set by the hardware to the checksum of the first coalesced packet. Obviously, this checksum is not valid for the merged LRO packet and should be fixed. We can use the CQE checksum which covers the checksum of the entire merged packet TCP payload to help us calculate the checksum incrementally.
Tested by sending IPv4/6 traffic with LRO enabled, RX checksum disabled and watching nstat checksum error counters (in addition to the obvious bandwidth drop caused by checksum errors).
This bug is usually "hidden" since LRO packets would go through the CHECKSUM_UNNECESSARY flow which does not validate the packet checksum.
It's important to note that previous to this patch, LRO packets provided with CHECKSUM_UNNECESSARY are indeed packets with a correct validated checksum (even though the checksum inside the TCP header is incorrect), since the hardware LRO aggregation is terminated upon receiving a packet with bad checksum.
Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Gal Pressman galp@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 47 +++++++++++++++++------- 1 file changed, 34 insertions(+), 13 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -36,6 +36,7 @@ #include <linux/tcp.h> #include <linux/bpf_trace.h> #include <net/busy_poll.h> +#include <net/ip6_checksum.h> #include "en.h" #include "en_tc.h" #include "eswitch.h" @@ -546,20 +547,33 @@ bool mlx5e_post_rx_mpwqes(struct mlx5e_r return true; }
+static void mlx5e_lro_update_tcp_hdr(struct mlx5_cqe64 *cqe, struct tcphdr *tcp) +{ + u8 l4_hdr_type = get_cqe_l4_hdr_type(cqe); + u8 tcp_ack = (l4_hdr_type == CQE_L4_HDR_TYPE_TCP_ACK_NO_DATA) || + (l4_hdr_type == CQE_L4_HDR_TYPE_TCP_ACK_AND_DATA); + + tcp->check = 0; + tcp->psh = get_cqe_lro_tcppsh(cqe); + + if (tcp_ack) { + tcp->ack = 1; + tcp->ack_seq = cqe->lro_ack_seq_num; + tcp->window = cqe->lro_tcp_win; + } +} + static void mlx5e_lro_update_hdr(struct sk_buff *skb, struct mlx5_cqe64 *cqe, u32 cqe_bcnt) { struct ethhdr *eth = (struct ethhdr *)(skb->data); struct tcphdr *tcp; int network_depth = 0; + __wsum check; __be16 proto; u16 tot_len; void *ip_p;
- u8 l4_hdr_type = get_cqe_l4_hdr_type(cqe); - u8 tcp_ack = (l4_hdr_type == CQE_L4_HDR_TYPE_TCP_ACK_NO_DATA) || - (l4_hdr_type == CQE_L4_HDR_TYPE_TCP_ACK_AND_DATA); - skb->mac_len = ETH_HLEN; proto = __vlan_get_protocol(skb, eth->h_proto, &network_depth);
@@ -577,23 +591,30 @@ static void mlx5e_lro_update_hdr(struct ipv4->check = 0; ipv4->check = ip_fast_csum((unsigned char *)ipv4, ipv4->ihl); + + mlx5e_lro_update_tcp_hdr(cqe, tcp); + check = csum_partial(tcp, tcp->doff * 4, + csum_unfold((__force __sum16)cqe->check_sum)); + /* Almost done, don't forget the pseudo header */ + tcp->check = csum_tcpudp_magic(ipv4->saddr, ipv4->daddr, + tot_len - sizeof(struct iphdr), + IPPROTO_TCP, check); } else { + u16 payload_len = tot_len - sizeof(struct ipv6hdr); struct ipv6hdr *ipv6 = ip_p;
tcp = ip_p + sizeof(struct ipv6hdr); skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
ipv6->hop_limit = cqe->lro_min_ttl; - ipv6->payload_len = cpu_to_be16(tot_len - - sizeof(struct ipv6hdr)); - } + ipv6->payload_len = cpu_to_be16(payload_len);
- tcp->psh = get_cqe_lro_tcppsh(cqe); - - if (tcp_ack) { - tcp->ack = 1; - tcp->ack_seq = cqe->lro_ack_seq_num; - tcp->window = cqe->lro_tcp_win; + mlx5e_lro_update_tcp_hdr(cqe, tcp); + check = csum_partial(tcp, tcp->doff * 4, + csum_unfold((__force __sum16)cqe->check_sum)); + /* Almost done, don't forget the pseudo header */ + tcp->check = csum_ipv6_magic(&ipv6->saddr, &ipv6->daddr, payload_len, + IPPROTO_TCP, check); } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tommi Rantala tommi.t.rantala@nokia.com
[ Upstream commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8 ]
Fix dst reference count leak in sctp_v4_get_dst() introduced in commit 410f03831 ("sctp: add routing output fallback"):
When walking the address_list, successive ip_route_output_key() calls may return the same rt->dst with the reference incremented on each call.
The code would not decrement the dst refcount when the dst pointer was identical from the previous iteration, causing the dst refcnt leak.
Testcase: ip netns add TEST ip netns exec TEST ip link set lo up ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add dummy2 type dummy ip link set dev dummy0 netns TEST ip link set dev dummy1 netns TEST ip link set dev dummy2 netns TEST ip netns exec TEST ip addr add 192.168.1.1/24 dev dummy0 ip netns exec TEST ip link set dummy0 up ip netns exec TEST ip addr add 192.168.1.2/24 dev dummy1 ip netns exec TEST ip link set dummy1 up ip netns exec TEST ip addr add 192.168.1.3/24 dev dummy2 ip netns exec TEST ip link set dummy2 up ip netns exec TEST sctp_test -H 192.168.1.2 -P 20002 -h 192.168.1.1 -p 20000 -s -B 192.168.1.3 ip netns del TEST
In 4.4 and 4.9 kernels this results to: [ 354.179591] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 364.419674] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 374.663664] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 384.903717] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 395.143724] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 405.383645] unregister_netdevice: waiting for lo to become free. Usage count = 1 ...
Fixes: 410f03831 ("sctp: add routing output fallback") Fixes: 0ca50d12f ("sctp: fix src address selection if using secondary addresses") Signed-off-by: Tommi Rantala tommi.t.rantala@nokia.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/protocol.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
--- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -514,22 +514,20 @@ static void sctp_v4_get_dst(struct sctp_ if (IS_ERR(rt)) continue;
- if (!dst) - dst = &rt->dst; - /* Ensure the src address belongs to the output * interface. */ odev = __ip_dev_find(sock_net(sk), laddr->a.v4.sin_addr.s_addr, false); if (!odev || odev->ifindex != fl4->flowi4_oif) { - if (&rt->dst != dst) + if (!dst) + dst = &rt->dst; + else dst_release(&rt->dst); continue; }
- if (dst != &rt->dst) - dst_release(dst); + dst_release(dst); dst = &rt->dst; break; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shalom Toledo shalomt@mellanox.com
[ Upstream commit 0a8a1bf17e3af34f1f8d2368916a6327f8b3bfd5 ]
Until now, we assumed that in case of error when adding FDB entries, the write operation will fail, but this is not the case. Instead, we need to check that the number of entries reported in the response is equal to the number of entries specified in the request.
Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Reported-by: Ido Schimmel idosch@mellanox.com Signed-off-by: Shalom Toledo shalomt@mellanox.com Reviewed-by: Ido Schimmel idosch@mellanox.com Signed-off-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 29 +++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -1098,6 +1098,7 @@ static int __mlxsw_sp_port_fdb_uc_op(str bool dynamic) { char *sfd_pl; + u8 num_rec; int err;
sfd_pl = kmalloc(MLXSW_REG_SFD_LEN, GFP_KERNEL); @@ -1107,9 +1108,16 @@ static int __mlxsw_sp_port_fdb_uc_op(str mlxsw_reg_sfd_pack(sfd_pl, mlxsw_sp_sfd_op(adding), 0); mlxsw_reg_sfd_uc_pack(sfd_pl, 0, mlxsw_sp_sfd_rec_policy(dynamic), mac, fid, action, local_port); + num_rec = mlxsw_reg_sfd_num_rec_get(sfd_pl); err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfd), sfd_pl); - kfree(sfd_pl); + if (err) + goto out; + + if (num_rec != mlxsw_reg_sfd_num_rec_get(sfd_pl)) + err = -EBUSY;
+out: + kfree(sfd_pl); return err; }
@@ -1134,6 +1142,7 @@ static int mlxsw_sp_port_fdb_uc_lag_op(s bool adding, bool dynamic) { char *sfd_pl; + u8 num_rec; int err;
sfd_pl = kmalloc(MLXSW_REG_SFD_LEN, GFP_KERNEL); @@ -1144,9 +1153,16 @@ static int mlxsw_sp_port_fdb_uc_lag_op(s mlxsw_reg_sfd_uc_lag_pack(sfd_pl, 0, mlxsw_sp_sfd_rec_policy(dynamic), mac, fid, MLXSW_REG_SFD_REC_ACTION_NOP, lag_vid, lag_id); + num_rec = mlxsw_reg_sfd_num_rec_get(sfd_pl); err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfd), sfd_pl); - kfree(sfd_pl); + if (err) + goto out;
+ if (num_rec != mlxsw_reg_sfd_num_rec_get(sfd_pl)) + err = -EBUSY; + +out: + kfree(sfd_pl); return err; }
@@ -1191,6 +1207,7 @@ static int mlxsw_sp_port_mdb_op(struct m u16 fid, u16 mid, bool adding) { char *sfd_pl; + u8 num_rec; int err;
sfd_pl = kmalloc(MLXSW_REG_SFD_LEN, GFP_KERNEL); @@ -1200,7 +1217,15 @@ static int mlxsw_sp_port_mdb_op(struct m mlxsw_reg_sfd_pack(sfd_pl, mlxsw_sp_sfd_op(adding), 0); mlxsw_reg_sfd_mc_pack(sfd_pl, 0, addr, fid, MLXSW_REG_SFD_REC_ACTION_NOP, mid); + num_rec = mlxsw_reg_sfd_num_rec_get(sfd_pl); err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfd), sfd_pl); + if (err) + goto out; + + if (num_rec != mlxsw_reg_sfd_num_rec_get(sfd_pl)) + err = -EBUSY; + +out: kfree(sfd_pl); return err; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gal Pressman galp@mellanox.com
[ Upstream commit 2f0db87901698cd73d828cc6fb1957b8916fc911 ]
When allocating a drop rq, no numa node is explicitly set which means allocations are done on node zero. This is not necessarily the nearest numa node to the HCA, and even worse, might even be a memoryless numa node.
Choose the numa_node given to us by the pci device in order to properly allocate the coherent dma memory instead of assuming zero is valid.
Fixes: 556dd1b9c313 ("net/mlx5e: Set drop RQ's necessary parameters only") Signed-off-by: Gal Pressman galp@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1918,13 +1918,16 @@ static void mlx5e_build_rq_param(struct param->wq.linear = 1; }
-static void mlx5e_build_drop_rq_param(struct mlx5e_rq_param *param) +static void mlx5e_build_drop_rq_param(struct mlx5_core_dev *mdev, + struct mlx5e_rq_param *param) { void *rqc = param->rqc; void *wq = MLX5_ADDR_OF(rqc, rqc, wq);
MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_LINKED_LIST); MLX5_SET(wq, wq, log_wq_stride, ilog2(sizeof(struct mlx5e_rx_wqe))); + + param->wq.buf_numa_node = dev_to_node(&mdev->pdev->dev); }
static void mlx5e_build_sq_param_common(struct mlx5e_priv *priv, @@ -2778,6 +2781,9 @@ static int mlx5e_alloc_drop_cq(struct ml struct mlx5e_cq *cq, struct mlx5e_cq_param *param) { + param->wq.buf_numa_node = dev_to_node(&mdev->pdev->dev); + param->wq.db_numa_node = dev_to_node(&mdev->pdev->dev); + return mlx5e_alloc_cq_common(mdev, param, cq); }
@@ -2789,7 +2795,7 @@ static int mlx5e_open_drop_rq(struct mlx struct mlx5e_cq *cq = &drop_rq->cq; int err;
- mlx5e_build_drop_rq_param(&rq_param); + mlx5e_build_drop_rq_param(mdev, &rq_param);
err = mlx5e_alloc_drop_cq(mdev, cq, &cq_param); if (err)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 08f5138512180a479ce6b9d23b825c9f4cd3be77 ]
This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates also PHY state changes and we should do what the symbol says.
Fixes: 84a527a41f38 ("net: phylib: fix interrupts re-enablement in phy_start") Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/phy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -842,7 +842,7 @@ void phy_start(struct phy_device *phydev break; case PHY_HALTED: /* make sure interrupts are re-enabled for the PHY */ - if (phydev->irq != PHY_POLL) { + if (phy_interrupt_is_valid(phydev)) { err = phy_enable_interrupts(phydev); if (err < 0) break;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Lesokhin ilyal@mellanox.com
[ Upstream commit 808cf9e38cd7923036a99f459ccc8cf2955e47af ]
Avoid SKB coalescing if eor bit is set in one of the relevant SKBs.
Fixes: c134ecb87817 ("tcp: Make use of MSG_EOR in tcp_sendmsg") Signed-off-by: Ilya Lesokhin ilyal@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_output.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1973,6 +1973,24 @@ static inline void tcp_mtu_check_reprobe } }
+static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) +{ + struct sk_buff *skb, *next; + + skb = tcp_send_head(sk); + tcp_for_write_queue_from_safe(skb, next, sk) { + if (len <= skb->len) + break; + + if (unlikely(TCP_SKB_CB(skb)->eor)) + return false; + + len -= skb->len; + } + + return true; +} + /* Create a new MTU probe if we are ready. * MTU probe is regularly attempting to increase the path MTU by * deliberately sending larger packets. This discovers routing @@ -2045,6 +2063,9 @@ static int tcp_mtu_probe(struct sock *sk return 0; }
+ if (!tcp_can_coalesce_send_queue_head(sk, probe_size)) + return -1; + /* We're allowed to probe. Build it now. */ nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false); if (!nskb) @@ -2080,6 +2101,10 @@ static int tcp_mtu_probe(struct sock *sk /* We've eaten all the data from this skb. * Throw it away. */ TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags; + /* If this is the last SKB we copy and eor is set + * we need to propagate it to the new skb. + */ + TCP_SKB_CB(nskb)->eor = TCP_SKB_CB(skb)->eor; tcp_unlink_write_queue(skb, sk); sk_wmem_free_skb(sk, skb); } else {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 93c62c45ed5fad1b87e3a45835b251cd68de9c46 ]
All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send both parts of the iov[] buffer, but one of them does not. Fix it so that it does.
Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc header included, but no payload.
Fixes: 5a924b8951f8 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs") Reported-by: Marc Dionne marc.dionne@auristor.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rxrpc/output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -395,7 +395,7 @@ send_fragmentable: (char *)&opt, sizeof(opt)); if (ret == 0) { ret = kernel_sendmsg(conn->params.local->socket, &msg, - iov, 1, iov[0].iov_len); + iov, 2, len);
opt = IPV6_PMTUDISC_DO; kernel_setsockopt(conn->params.local->socket,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tonghao Zhang xiangxia.m.yue@gmail.com
[ Upstream commit a61a86f8db92923a2a4c857c49a795bcae754497 ]
The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096. And the tcp_wmem/tcp_rmem min default values are 4096.
Fixes: bd68a2a854ad ("net: set SK_MEM_QUANTUM to 4096") Cc: Eric Dumazet edumazet@google.com Signed-off-by: Tonghao Zhang xiangxia.m.yue@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/networking/ip-sysctl.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -508,7 +508,7 @@ tcp_rmem - vector of 3 INTEGERs: min, de min: Minimal size of receive buffer used by TCP sockets. It is guaranteed to each TCP socket, even under moderate memory pressure. - Default: 1 page + Default: 4K
default: initial size of receive buffer used by TCP sockets. This value overrides net.core.rmem_default used by other protocols. @@ -666,7 +666,7 @@ tcp_window_scaling - BOOLEAN tcp_wmem - vector of 3 INTEGERs: min, default, max min: Amount of memory reserved for send buffers for TCP sockets. Each TCP socket has rights to use it due to fact of its birth. - Default: 1 page + Default: 4K
default: initial size of send buffer used by TCP sockets. This value overrides net.core.wmem_default used by other protocols.
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Inbar Karmy inbark@mellanox.com
[ Upstream commit ef7a3518f7dd4f4cf5e5b5358c93d1eb78df28fb ]
When GRO is off, the transport header pointer in sk_buff is initialized to network's header.
To find the udp header, instead of using udp_hdr() which assumes skb_network_header was set, manually calculate the udp header offset.
Fixes: 0952da791c97 ("net/mlx5e: Add support for loopback selftest") Signed-off-by: Inbar Karmy inbark@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c @@ -216,7 +216,8 @@ mlx5e_test_loopback_validate(struct sk_b if (iph->protocol != IPPROTO_UDP) goto out;
- udph = udp_hdr(skb); + /* Don't assume skb_transport_header() was set */ + udph = (struct udphdr *)((u8 *)iph + 4 * iph->ihl); if (udph->dest != htons(9)) goto out;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit a5f7add332b4ea6d4b9480971b3b0f5e66466ae9 ]
pfifo_fast got percpu stats lately, uncovering a bug I introduced last year in linux-4.10.
I missed the fact that we have to clear our temporary storage before calling __gnet_stats_copy_basic() in the case of percpu stats.
Without this fix, rate estimators (tc qd replace dev xxx root est 1sec 4sec pfifo_fast) are utterly broken.
Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/gen_estimator.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/core/gen_estimator.c +++ b/net/core/gen_estimator.c @@ -66,6 +66,7 @@ struct net_rate_estimator { static void est_fetch_counters(struct net_rate_estimator *e, struct gnet_stats_basic_packed *b) { + memset(b, 0, sizeof(*b)); if (e->stats_lock) spin_lock(e->stats_lock);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit eb53f7af6f15285e2f6ada97285395343ce9f433 ]
The following sequence is currently broken:
# tc qdisc add dev foo ingress # tc filter replace dev foo protocol all ingress \ u32 match u8 0 0 action mirred egress mirror dev bar1 # tc filter replace dev foo protocol all ingress \ handle 800::800 pref 49152 \ u32 match u8 0 0 action mirred egress mirror dev bar2 Error: cls_u32: Key node flags do not match passed flags. We have an error talking to the kernel, -1
The error comes from u32_change() when comparing new and existing flags. The existing ones always contains one of TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state. These flags cannot be passed from userspace so the condition (n->flags != flags) in u32_change() always fails.
Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and TCA_CLS_FLAGS_IN_HW are not taken into account.
Fixes: 24d3dc6d27ea ("net/sched: cls_u32: Reflect HW offload status") Signed-off-by: Ivan Vecera ivecera@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/cls_u32.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -927,7 +927,8 @@ static int u32_change(struct net *net, s if (TC_U32_KEY(n->handle) == 0) return -EINVAL;
- if (n->flags != flags) + if ((n->flags ^ flags) & + ~(TCA_CLS_FLAGS_IN_HW | TCA_CLS_FLAGS_NOT_IN_HW)) return -EINVAL;
new = u32_init_knode(tp, n);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 27af86bb038d9c8b8066cd17854ddaf2ea92bce1 ]
The pr_err in sctp_hash_transport was supposed to report a sctp bug for using rhashtable/rhlist.
The err '-EEXIST' introduced in Commit cd2b70875058 ("sctp: check duplicate node before inserting a new transport") doesn't belong to that case.
So just return -EEXIST back without pr_err any kmsg.
Fixes: cd2b70875058 ("sctp: check duplicate node before inserting a new transport") Reported-by: Wei Chen weichen@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/input.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -897,15 +897,12 @@ int sctp_hash_transport(struct sctp_tran rhl_for_each_entry_rcu(transport, tmp, list, node) if (transport->asoc->ep == t->asoc->ep) { rcu_read_unlock(); - err = -EEXIST; - goto out; + return -EEXIST; } rcu_read_unlock();
err = rhltable_insert_key(&sctp_transport_hashtable, &arg, &t->node, sctp_hash_params); - -out: if (err) pr_err_once("insert transport fail, errno %d\n", err);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit d4131f09770d9b7471c9da65e6ecd2477746ac5c ]
This reverts commit cc663f4d4c97b7297fb45135ab23cfd508b35a77. While fixing some broken middle-boxes that modifies receive window fields, it does not address middle-boxes that strip off SACK options. The best solution is to fully revert this patch and the root F-RTO enhancement.
Fixes: cc663f4d4c97 ("tcp: restrict F-RTO to work-around broken middle-boxes") Reported-by: Teodor Milkov tm@del.bg Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1947,7 +1947,6 @@ void tcp_enter_loss(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); struct sk_buff *skb; - bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; bool is_reneg; /* is receiver reneging on SACKs? */ bool mark_lost;
@@ -2010,17 +2009,15 @@ void tcp_enter_loss(struct sock *sk) tp->high_seq = tp->snd_nxt; tcp_ecn_queue_cwr(tp);
- /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous - * loss recovery is underway except recurring timeout(s) on - * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing - * - * In theory F-RTO can be used repeatedly during loss recovery. - * In practice this interacts badly with broken middle-boxes that - * falsely raise the receive window, which results in repeated - * timeouts and stop-and-go behavior. + /* F-RTO RFC5682 sec 3.1 step 1 mandates to disable F-RTO + * if a previous recovery is underway, otherwise it may incorrectly + * call a timeout spurious if some previously retransmitted packets + * are s/acked (sec 3.2). We do not apply that retriction since + * retransmitted skbs are permanently tagged with TCPCB_EVER_RETRANS + * so FLAG_ORIG_SACK_ACKED is always correct. But we do disable F-RTO + * on PTMU discovery to avoid sending new data. */ tp->frto = sysctl_tcp_frto && - (new_recovery || icsk->icsk_retransmits) && !inet_csk(sk)->icsk_mtup.probe_size; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit fc68e171d376c322e6777a3d7ac2f0278b68b17f ]
This reverts commit 89fe18e44f7ee5ab1c90d0dff5835acee7751427.
While the patch could detect more spurious timeouts, it could cause poor TCP performance on broken middle-boxes that modifies TCP packets (e.g. receive window, SACK options). Since the performance gain is much smaller compared to the potential loss. The best solution is to fully revert the change.
Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") Reported-by: Teodor Milkov tm@del.bg Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1947,6 +1947,7 @@ void tcp_enter_loss(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); struct sk_buff *skb; + bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; bool is_reneg; /* is receiver reneging on SACKs? */ bool mark_lost;
@@ -2009,15 +2010,12 @@ void tcp_enter_loss(struct sock *sk) tp->high_seq = tp->snd_nxt; tcp_ecn_queue_cwr(tp);
- /* F-RTO RFC5682 sec 3.1 step 1 mandates to disable F-RTO - * if a previous recovery is underway, otherwise it may incorrectly - * call a timeout spurious if some previously retransmitted packets - * are s/acked (sec 3.2). We do not apply that retriction since - * retransmitted skbs are permanently tagged with TCPCB_EVER_RETRANS - * so FLAG_ORIG_SACK_ACKED is always correct. But we do disable F-RTO - * on PTMU discovery to avoid sending new data. + /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous + * loss recovery is underway except recurring timeout(s) on + * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing */ tp->frto = sysctl_tcp_frto && + (new_recovery || icsk->icsk_retransmits) && !inet_csk(sk)->icsk_mtup.probe_size; }
@@ -2696,18 +2694,14 @@ static void tcp_process_loss(struct sock tcp_try_undo_loss(sk, false)) return;
- /* The ACK (s)acks some never-retransmitted data meaning not all - * the data packets before the timeout were lost. Therefore we - * undo the congestion window and state. This is essentially - * the operation in F-RTO (RFC5682 section 3.1 step 3.b). Since - * a retransmitted skb is permantly marked, we can apply such an - * operation even if F-RTO was not used. - */ - if ((flag & FLAG_ORIG_SACK_ACKED) && - tcp_try_undo_loss(sk, tp->undo_marker)) - return; - if (tp->frto) { /* F-RTO RFC5682 sec 3.1 (sack enhanced version). */ + /* Step 3.b. A timeout is spurious if not all data are + * lost, i.e., never-retransmitted data are (s)acked. + */ + if ((flag & FLAG_ORIG_SACK_ACKED) && + tcp_try_undo_loss(sk, true)) + return; + if (after(tp->snd_nxt, tp->high_seq)) { if (flag & FLAG_DATA_SACKED || is_dupack) tp->frto = 0; /* Step 3.a. loss was real */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Pirko jiri@mellanox.com
[ Upstream commit 0f2d2b2736b08dafa3bde31d048750fbc8df3a31 ]
Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create() use ERR_PTR macro to propagate int err through return of a pointer, the return value is not NULL in case of failure. So if one of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table is not NULL and mlxsw_sp_vr_is_used wrongly assumes that vr is in use which leads to crash like following one:
[ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9 [ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum]
Fix this by using local variables to hold the pointers and set vr->* only in case everything went fine.
Fixes: 76610ebbde18 ("mlxsw: spectrum_router: Refactor virtual router handling") Fixes: a3d9bc506d64 ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support") Fixes: d42b0965b1d4 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality") Signed-off-by: Jiri Pirko jiri@mellanox.com Reviewed-by: Ido Schimmel idosch@mellanox.com Signed-off-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 19 ++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -729,26 +729,29 @@ static struct mlxsw_sp_fib *mlxsw_sp_vr_ static struct mlxsw_sp_vr *mlxsw_sp_vr_create(struct mlxsw_sp *mlxsw_sp, u32 tb_id) { + struct mlxsw_sp_fib *fib4; + struct mlxsw_sp_fib *fib6; struct mlxsw_sp_vr *vr; int err;
vr = mlxsw_sp_vr_find_unused(mlxsw_sp); if (!vr) return ERR_PTR(-EBUSY); - vr->fib4 = mlxsw_sp_fib_create(vr, MLXSW_SP_L3_PROTO_IPV4); - if (IS_ERR(vr->fib4)) - return ERR_CAST(vr->fib4); - vr->fib6 = mlxsw_sp_fib_create(vr, MLXSW_SP_L3_PROTO_IPV6); - if (IS_ERR(vr->fib6)) { - err = PTR_ERR(vr->fib6); + fib4 = mlxsw_sp_fib_create(vr, MLXSW_SP_L3_PROTO_IPV4); + if (IS_ERR(fib4)) + return ERR_CAST(fib4); + fib6 = mlxsw_sp_fib_create(vr, MLXSW_SP_L3_PROTO_IPV6); + if (IS_ERR(fib6)) { + err = PTR_ERR(fib6); goto err_fib6_create; } + vr->fib4 = fib4; + vr->fib6 = fib6; vr->tb_id = tb_id; return vr;
err_fib6_create: - mlxsw_sp_fib_destroy(vr->fib4); - vr->fib4 = NULL; + mlxsw_sp_fib_destroy(fib4); return ERR_PTR(err); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Ahern dsahern@gmail.com
[ Upstream commit 1fe4b1184c2ae2bfbf9e8b14c9c0c1945c98f205 ]
The result of the skb flow dissect is copied from keys to hash_keys to ensure only the intended data is hashed. The original L4 hash patch overlooked setting the addr_type for this case; add it.
Fixes: bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice") Reported-by: Ido Schimmel idosch@idosch.org Signed-off-by: David Ahern dsahern@gmail.com Acked-by: Nikolay Aleksandrov nikolay@cumulusnetworks.com Reviewed-by: Ido Schimmel idosch@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/route.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1832,6 +1832,8 @@ int fib_multipath_hash(const struct fib_ return skb_get_hash_raw(skb) >> 1; memset(&hash_keys, 0, sizeof(hash_keys)); skb_flow_dissect_flow_keys(skb, &keys, flag); + + hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; hash_keys.addrs.v4addrs.src = keys.addrs.v4addrs.src; hash_keys.addrs.v4addrs.dst = keys.addrs.v4addrs.dst; hash_keys.ports.src = keys.ports.src;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kodanev alexey.kodanev@oracle.com
[ Upstream commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2 ]
When going through the bind address list in sctp_v6_get_dst() and the previously found address is better ('matchlen > bmatchlen'), the code continues to the next iteration without releasing currently held destination.
Fix it by releasing 'bdst' before continue to the next iteration, and instead of introducing one more '!IS_ERR(bdst)' check for dst_release(), move the already existed one right after ip6_dst_lookup_flow(), i.e. we shouldn't proceed further if we get an error for the route lookup.
Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using secondary addresses for ipv6") Signed-off-by: Alexey Kodanev alexey.kodanev@oracle.com Acked-by: Neil Horman nhorman@tuxdriver.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/ipv6.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -326,8 +326,10 @@ static void sctp_v6_get_dst(struct sctp_ final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final); bdst = ip6_dst_lookup_flow(sk, fl6, final_p);
- if (!IS_ERR(bdst) && - ipv6_chk_addr(dev_net(bdst->dev), + if (IS_ERR(bdst)) + continue; + + if (ipv6_chk_addr(dev_net(bdst->dev), &laddr->a.v6.sin6_addr, bdst->dev, 1)) { if (!IS_ERR_OR_NULL(dst)) dst_release(dst); @@ -336,8 +338,10 @@ static void sctp_v6_get_dst(struct sctp_ }
bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a); - if (matchlen > bmatchlen) + if (matchlen > bmatchlen) { + dst_release(bdst); continue; + }
if (!IS_ERR_OR_NULL(dst)) dst_release(dst);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit 0e5a82efda872c2469c210957d7d4161ef8f4391 ]
When a VLAN is added on a port, a reference is taken on the corresponding master VLAN entry. If it does not already exist, then it is created and a reference taken.
However, in the second case a reference is not really taken when CONFIG_REFCOUNT_FULL is enabled as refcount_inc() is replaced by refcount_inc_not_zero().
Fix this by using refcount_set() on a newly created master VLAN entry.
Fixes: 251277598596 ("net, bridge: convert net_bridge_vlan.refcnt from atomic_t to refcount_t") Signed-off-by: Ido Schimmel idosch@mellanox.com Acked-by: Nikolay Aleksandrov nikolay@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/br_vlan.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -157,6 +157,8 @@ static struct net_bridge_vlan *br_vlan_g masterv = br_vlan_find(vg, vid); if (WARN_ON(!masterv)) return NULL; + refcount_set(&masterv->refcnt, 1); + return masterv; } refcount_inc(&masterv->refcnt);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eran Ben Elisha eranbe@mellanox.com
[ Upstream commit f600c6088018d1dbc5777d18daa83660f7ea4a64 ]
Driver tries to copy at least MLX5E_MIN_INLINE bytes into the control segment of the WQE. It assumes that the linear part contains at least MLX5E_MIN_INLINE bytes, which can be wrong.
Cited commit verified that driver will not copy more bytes into the inline header part that the actual size of the packet. Re-factor this check to make sure we do not exceed the linear part as well.
This fix is aligned with the current driver's assumption that the entire L2 will be present in the linear part of the SKB.
Fixes: 6aace17e64f4 ("net/mlx5e: Fix inline header size for small packets") Signed-off-by: Eran Ben Elisha eranbe@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -155,7 +155,7 @@ static inline u16 mlx5e_calc_min_inline( default: hlen = mlx5e_skb_l2_header_offset(skb); } - return min_t(u16, hlen, skb->len); + return min_t(u16, hlen, skb_headlen(skb)); }
static inline void mlx5e_tx_skb_pull_inline(unsigned char **skb_data,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
[ Upstream commit cfd092f2db8b4b6727e1c03ef68a7842e1023573 ]
After resuming from suspend, the PCI device support must re-enable the interrupt setting so that interrupts are actually delivered.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/amd/xgbe/xgbe-pci.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c @@ -426,6 +426,8 @@ static int xgbe_pci_resume(struct pci_de struct net_device *netdev = pdata->netdev; int ret = 0;
+ XP_IOWRITE(pdata, XP_INT_EN, 0x1fffff); + pdata->lpm_ctrl &= ~MDIO_CTRL1_LPOWER; XMDIO_WRITE(pdata, MDIO_MMD_PCS, MDIO_CTRL1, pdata->lpm_ctrl);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit d7cdee5ea8d28ae1b6922deb0c1badaa3aa0ef8c ]
Li Shuang reported an Oops with cls_u32 due to an use-after-free in u32_destroy_key(). The use-after-free can be triggered with:
dev=lo tc qdisc add dev $dev root handle 1: htb default 10 tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256 tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\ 10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1: tc qdisc del dev $dev root
Which causes the following kasan splat:
================================================================== BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 [cls_u32] Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571
CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32] Call Trace: dump_stack+0xd6/0x182 ? dma_virt_map_sg+0x22e/0x22e print_address_description+0x73/0x290 kasan_report+0x277/0x360 ? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32] u32_destroy_key.constprop.21+0x117/0x140 [cls_u32] u32_delete_key_freepf_work+0x1c/0x30 [cls_u32] process_one_work+0xae0/0x1c80 ? sched_clock+0x5/0x10 ? pwq_dec_nr_in_flight+0x3c0/0x3c0 ? _raw_spin_unlock_irq+0x29/0x40 ? trace_hardirqs_on_caller+0x381/0x570 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x1e5/0x760 ? finish_task_switch+0x208/0x760 ? preempt_notifier_dec+0x20/0x20 ? __schedule+0x839/0x1ee0 ? check_noncircular+0x20/0x20 ? firmware_map_remove+0x73/0x73 ? find_held_lock+0x39/0x1c0 ? worker_thread+0x434/0x1820 ? lock_contended+0xee0/0xee0 ? lock_release+0x1100/0x1100 ? init_rescuer.part.16+0x150/0x150 ? retint_kernel+0x10/0x10 worker_thread+0x216/0x1820 ? process_one_work+0x1c80/0x1c80 ? lock_acquire+0x1a5/0x540 ? lock_downgrade+0x6b0/0x6b0 ? sched_clock+0x5/0x10 ? lock_release+0x1100/0x1100 ? compat_start_thread+0x80/0x80 ? do_raw_spin_trylock+0x190/0x190 ? _raw_spin_unlock_irq+0x29/0x40 ? trace_hardirqs_on_caller+0x381/0x570 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x1e5/0x760 ? finish_task_switch+0x208/0x760 ? preempt_notifier_dec+0x20/0x20 ? __schedule+0x839/0x1ee0 ? kmem_cache_alloc_trace+0x143/0x320 ? firmware_map_remove+0x73/0x73 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1c0 ? schedule+0xf3/0x3b0 ? lock_downgrade+0x6b0/0x6b0 ? __schedule+0x1ee0/0x1ee0 ? do_wait_intr_irq+0x340/0x340 ? do_raw_spin_trylock+0x190/0x190 ? _raw_spin_unlock_irqrestore+0x32/0x60 ? process_one_work+0x1c80/0x1c80 ? process_one_work+0x1c80/0x1c80 kthread+0x312/0x3d0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50
Allocated by task 1688: kasan_kmalloc+0xa0/0xd0 __kmalloc+0x162/0x380 u32_change+0x1220/0x3c9e [cls_u32] tc_ctl_tfilter+0x1ba6/0x2f80 rtnetlink_rcv_msg+0x4f0/0x9d0 netlink_rcv_skb+0x124/0x320 netlink_unicast+0x430/0x600 netlink_sendmsg+0x8fa/0xd60 sock_sendmsg+0xb1/0xe0 ___sys_sendmsg+0x678/0x980 __sys_sendmsg+0xc4/0x210 do_syscall_64+0x232/0x7f0 return_from_SYSCALL_64+0x0/0x75
Freed by task 112: kasan_slab_free+0x71/0xc0 kfree+0x114/0x320 rcu_process_callbacks+0xc3f/0x1600 __do_softirq+0x2bf/0xc06
The buggy address belongs to the object at ffff881b83dae600 which belongs to the cache kmalloc-4096 of size 4096 The buggy address is located 24 bytes inside of 4096-byte region [ffff881b83dae600, ffff881b83daf600) The buggy address belongs to the page: page:ffffea006e0f6a00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x17ffffc0008100(slab|head) raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007 raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
The problem is that the htnode is freed before the linked knodes and the latter will try to access the first at u32_destroy_key() time. This change addresses the issue using the htnode refcnt to guarantee the correct free order. While at it also add a RCU annotation, to keep sparse happy.
v1 -> v2: use rtnl_derefence() instead of RCU read locks v2 -> v3: - don't check refcnt in u32_destroy_hnode() - cleaned-up u32_destroy() implementation - cleaned-up code comment v3 -> v4: - dropped unneeded comment
Reported-by: Li Shuang shuali@redhat.com Fixes: c0d378ef1266 ("net_sched: use tcf_queue_work() in u32 filter") Signed-off-by: Paolo Abeni pabeni@redhat.com Acked-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/cls_u32.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
--- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -398,10 +398,12 @@ static int u32_init(struct tcf_proto *tp static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n, bool free_pf) { + struct tc_u_hnode *ht = rtnl_dereference(n->ht_down); + tcf_exts_destroy(&n->exts); tcf_exts_put_net(&n->exts); - if (n->ht_down) - n->ht_down->refcnt--; + if (ht && --ht->refcnt == 0) + kfree(ht); #ifdef CONFIG_CLS_U32_PERF if (free_pf) free_percpu(n->pf); @@ -649,16 +651,15 @@ static void u32_destroy(struct tcf_proto
hlist_del(&tp_c->hnode);
- for (ht = rtnl_dereference(tp_c->hlist); - ht; - ht = rtnl_dereference(ht->next)) { - ht->refcnt--; - u32_clear_hnode(tp, ht); - } - while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) { + u32_clear_hnode(tp, ht); RCU_INIT_POINTER(tp_c->hlist, ht->next); - kfree_rcu(ht, rcu); + + /* u32_destroy_key() will later free ht for us, if it's + * still referenced by some knode + */ + if (--ht->refcnt == 0) + kfree_rcu(ht, rcu); }
kfree(tp_c);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit d1c95af366961101819f07e3c64d44f3be7f0367 ]
When mlxsw replaces (or deletes) a route it removes the offload indication from the replaced route. This is problematic for IPv4 routes, as the offload indication is stored in the fib_info which is usually shared between multiple routes.
Instead of unconditionally clearing the offload indication, only clear it if no other route is using the fib_info.
Fixes: 3984d1a89fe7 ("mlxsw: spectrum_router: Provide offload indication using nexthop flags") Signed-off-by: Ido Schimmel idosch@mellanox.com Reported-by: Alexander Petrovskiy alexpe@mellanox.com Tested-by: Alexander Petrovskiy alexpe@mellanox.com Signed-off-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -3032,6 +3032,9 @@ mlxsw_sp_fib4_entry_offload_unset(struct struct mlxsw_sp_nexthop_group *nh_grp = fib_entry->nh_group; int i;
+ if (!list_is_singular(&nh_grp->fib_list)) + return; + for (i = 0; i < nh_grp->count; i++) { struct mlxsw_sp_nexthop *nh = &nh_grp->nexthops[i];
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Jason A. Donenfeld" Jason@zx2c4.com
[ Upstream commit b87b6194be631c94785fe93398651e804ed43e28 ]
Before, if cb->start() failed, the module reference would never be put, because cb->cb_running is intentionally false at this point. Users are generally annoyed by this because they can no longer unload modules that leak references. Also, it may be possible to tediously wrap a reference counter back to zero, especially since module.c still uses atomic_inc instead of refcount_inc.
This patch expands the error path to simply call module_put if cb->start() fails.
Fixes: 41c87425a1ac ("netlink: do not set cb_running if dump's start() errs") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netlink/af_netlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2276,7 +2276,7 @@ int __netlink_dump_start(struct sock *ss if (cb->start) { ret = cb->start(cb); if (ret) - goto error_unlock; + goto error_put; }
nlk->cb_running = true; @@ -2296,6 +2296,8 @@ int __netlink_dump_start(struct sock *ss */ return -EINTR;
+error_put: + module_put(control->module); error_unlock: sock_put(sk); mutex_unlock(nlk->cb_mutex);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Soheil Hassas Yeganeh soheil@google.com
[ Upstream commit a27fd7a8ed3856faaf5a2ff1c8c5f00c0667aaa0 ]
When the connection is reset, there is no point in keeping the packets on the write queue until the connection is closed.
RFC 793 (page 70) and RFC 793-bis (page 64) both suggest purging the write queue upon RST: https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07
Moreover, this is essential for a correct MSG_ZEROCOPY implementation, because userspace cannot call close(fd) before receiving zerocopy signals even when the connection is reset.
Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Signed-off-by: Soheil Hassas Yeganeh soheil@google.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4011,6 +4011,7 @@ void tcp_reset(struct sock *sk) /* This barrier is coupled with smp_rmb() in tcp_poll() */ smp_wmb();
+ tcp_write_queue_purge(sk); tcp_done(sk);
if (!sock_flag(sk, SOCK_DEAD))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit 1bb4f2e868a2891ab8bc668b8173d6ccb8c4ce6f ]
We don't flush batched XDP packets through xdp_do_flush_map(), this will cause packets stall at TX queue. Consider we don't do XDP on NAPI poll(), the only possible fix is to call xdp_do_flush_map() immediately after xdp_do_redirect().
Note, this in fact won't try to batch packets through devmap, we could address in the future.
Reported-by: Christoffer Dall christoffer.dall@linaro.org Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1333,6 +1333,7 @@ static struct sk_buff *tun_build_skb(str get_page(alloc_frag->page); alloc_frag->offset += buflen; err = xdp_do_redirect(tun->dev, &xdp, xdp_prog); + xdp_do_flush_map(); if (err) goto err_redirect; rcu_read_unlock();
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit 23e43f07f896f8578318cfcc9466f1e8b8ab21b6 ]
Except for tuntap, all other drivers' XDP was implemented at NAPI poll() routine in a bh. This guarantees all XDP operation were done at the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But for tuntap, we do it in process context and we try to protect XDP processing by RCU reader lock. This is insufficient since CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which breaks the assumption that all XDP were processed in the same CPU.
Fixing this by simply disabling preemption during XDP processing.
Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1315,6 +1315,7 @@ static struct sk_buff *tun_build_skb(str else *skb_xdp = 0;
+ preempt_disable(); rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog && !*skb_xdp) { @@ -1337,6 +1338,7 @@ static struct sk_buff *tun_build_skb(str if (err) goto err_redirect; rcu_read_unlock(); + preempt_enable(); return NULL; case XDP_TX: xdp_xmit = true; @@ -1358,6 +1360,7 @@ static struct sk_buff *tun_build_skb(str skb = build_skb(buf, buflen); if (!skb) { rcu_read_unlock(); + preempt_enable(); return ERR_PTR(-ENOMEM); }
@@ -1370,10 +1373,12 @@ static struct sk_buff *tun_build_skb(str skb->dev = tun->dev; generic_xdp_tx(skb, xdp_prog); rcu_read_unlock(); + preempt_enable(); return NULL; }
rcu_read_unlock(); + preempt_enable();
return skb;
@@ -1381,6 +1386,7 @@ err_redirect: put_page(alloc_frag->page); err_xdp: rcu_read_unlock(); + preempt_enable(); this_cpu_inc(tun->pcpu_stats->rx_dropped); return NULL; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit 4e09ff5362843dff3accfa84c805c7f3a99de9cd ]
We try to disable NAPI to prevent a single XDP TX queue being used by multiple cpus. But we don't check if device is up (NAPI is enabled), this could result stall because of infinite wait in napi_disable(). Fixing this by checking device state through netif_running() before.
Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1995,8 +1995,9 @@ static int virtnet_xdp_set(struct net_de }
/* Make sure NAPI is not using any XDP TX queues for RX. */ - for (i = 0; i < vi->max_queue_pairs; i++) - napi_disable(&vi->rq[i].napi); + if (netif_running(dev)) + for (i = 0; i < vi->max_queue_pairs; i++) + napi_disable(&vi->rq[i].napi);
netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp); err = _virtnet_set_queues(vi, curr_qp + xdp_qp); @@ -2015,7 +2016,8 @@ static int virtnet_xdp_set(struct net_de } if (old_prog) bpf_prog_put(old_prog); - virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); + if (netif_running(dev)) + virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); }
return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ursula Braun ubraun@linux.vnet.ibm.com
[ Upstream commit 89271c65edd599207dd982007900506283c90ae3 ]
For a memory range/skb where the last byte falls onto a page boundary (ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the calculation currently doesn't round up to the next PFN due to an off-by-one error. Thus qeth believes that the skb occupies one page less than it actually does, and may select a IO buffer that doesn't have enough spare buffer elements to fit all of the skb's data. HW detects this as a malformed buffer descriptor, and raises an exception which then triggers device recovery.
Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Ursula Braun ubraun@linux.vnet.ibm.com Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -834,7 +834,7 @@ struct qeth_trap_id { */ static inline int qeth_get_elements_for_range(addr_t start, addr_t end) { - return PFN_UP(end - 1) - PFN_DOWN(start); + return PFN_UP(end) - PFN_DOWN(start); }
static inline int qeth_get_micros(void)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
[ Upstream commit 1c5b2216fbb973a9410e0b06389740b5c1289171 ]
send_control_data() applies some special handling to SETIP v4 IPA commands. But current code parses *all* command types for the SETIP command code. Limit the command code check to IPA commands.
Fixes: 5b54e16f1a54 ("qeth: do not spin for SETIP ip assist command") Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_core.h | 5 +++++ drivers/s390/net/qeth_core_main.c | 14 ++++++++------ 2 files changed, 13 insertions(+), 6 deletions(-)
--- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -580,6 +580,11 @@ struct qeth_cmd_buffer { void (*callback) (struct qeth_channel *, struct qeth_cmd_buffer *); };
+static inline struct qeth_ipa_cmd *__ipa_cmd(struct qeth_cmd_buffer *iob) +{ + return (struct qeth_ipa_cmd *)(iob->data + IPA_PDU_HEADER_SIZE); +} + /** * definition of a qeth channel, used for read and write */ --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -2073,7 +2073,7 @@ int qeth_send_control_data(struct qeth_c unsigned long flags; struct qeth_reply *reply = NULL; unsigned long timeout, event_timeout; - struct qeth_ipa_cmd *cmd; + struct qeth_ipa_cmd *cmd = NULL;
QETH_CARD_TEXT(card, 2, "sendctl");
@@ -2100,10 +2100,13 @@ int qeth_send_control_data(struct qeth_c while (atomic_cmpxchg(&card->write.irq_pending, 0, 1)) ; qeth_prepare_control_data(card, len, iob);
- if (IS_IPA(iob->data)) + if (IS_IPA(iob->data)) { + cmd = __ipa_cmd(iob); event_timeout = QETH_IPA_TIMEOUT; - else + } else { event_timeout = QETH_TIMEOUT; + } + timeout = jiffies + event_timeout;
QETH_CARD_TEXT(card, 6, "noirqpnd"); @@ -2128,9 +2131,8 @@ int qeth_send_control_data(struct qeth_c
/* we have only one long running ipassist, since we can ensure process context of this command we can sleep */ - cmd = (struct qeth_ipa_cmd *)(iob->data+IPA_PDU_HEADER_SIZE); - if ((cmd->hdr.command == IPA_CMD_SETIP) && - (cmd->hdr.prot_version == QETH_PROT_IPV4)) { + if (cmd && cmd->hdr.command == IPA_CMD_SETIP && + cmd->hdr.prot_version == QETH_PROT_IPV4) { if (!wait_event_timeout(reply->wait_q, atomic_read(&reply->received), event_timeout)) goto time_err;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
[ Upstream commit 12472af89632beb1ed8dea29d4efe208ca05b06a ]
qeth_get_elements_for_range() doesn't know how to handle a 0-length range (ie. start == end), and returns 1 when it should return 0. Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all of the skb's linear data) are skipped when mapping the skb into regular buffer elements.
This overestimation may cause several performance-related issues: 1. sub-optimal IO buffer selection, where the next buffer gets selected even though the skb would actually still fit into the current buffer. 2. forced linearization, if the element count for a non-linear skb exceeds QETH_MAX_BUFFER_ELEMENTS.
Rather than modifying qeth_get_elements_for_range() and adding overhead to every caller, fix up those callers that are in risk of passing a 0-length range.
Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_core_main.c | 10 ++++++---- drivers/s390/net/qeth_l3_main.c | 11 ++++++----- 2 files changed, 12 insertions(+), 9 deletions(-)
--- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -3861,10 +3861,12 @@ EXPORT_SYMBOL_GPL(qeth_get_elements_for_ int qeth_get_elements_no(struct qeth_card *card, struct sk_buff *skb, int extra_elems, int data_offset) { - int elements = qeth_get_elements_for_range( - (addr_t)skb->data + data_offset, - (addr_t)skb->data + skb_headlen(skb)) + - qeth_get_elements_for_frags(skb); + addr_t end = (addr_t)skb->data + skb_headlen(skb); + int elements = qeth_get_elements_for_frags(skb); + addr_t start = (addr_t)skb->data + data_offset; + + if (start != end) + elements += qeth_get_elements_for_range(start, end);
if ((elements + extra_elems) > QETH_MAX_BUFFER_ELEMENTS(card)) { QETH_DBF_MESSAGE(2, "Invalid size of IP packet " --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -2633,11 +2633,12 @@ static void qeth_tso_fill_header(struct static int qeth_l3_get_elements_no_tso(struct qeth_card *card, struct sk_buff *skb, int extra_elems) { - addr_t tcpdptr = (addr_t)tcp_hdr(skb) + tcp_hdrlen(skb); - int elements = qeth_get_elements_for_range( - tcpdptr, - (addr_t)skb->data + skb_headlen(skb)) + - qeth_get_elements_for_frags(skb); + addr_t start = (addr_t)tcp_hdr(skb) + tcp_hdrlen(skb); + addr_t end = (addr_t)skb->data + skb_headlen(skb); + int elements = qeth_get_elements_for_frags(skb); + + if (start != end) + elements += qeth_get_elements_for_range(start, end);
if ((elements + extra_elems) > QETH_MAX_BUFFER_ELEMENTS(card)) { QETH_DBF_MESSAGE(2,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
[ Upstream commit 98d823ab1fbdcb13abc25b420f9bb71bade42056 ]
If the HW is not reachable, then none of the IPs in qeth's internal table has been registered with the HW yet. So when deleting such an IP, there's no need to stage it for deregistration - just drop it from the table.
This fixes the "add-delete-add" scenario on an offline card, where the the second "add" merely increments the IP's use count. But as the IP is still set to DISP_ADDR_DELETE from the previous "delete" step, l3_recover_ip() won't register it with the HW when the card goes online.
Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_l3_main.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)
--- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -255,12 +255,8 @@ int qeth_l3_delete_ip(struct qeth_card * if (addr->in_progress) return -EINPROGRESS;
- if (!qeth_card_hw_is_reachable(card)) { - addr->disp_flag = QETH_DISP_ADDR_DELETE; - return 0; - } - - rc = qeth_l3_deregister_addr_entry(card, addr); + if (qeth_card_hw_is_reachable(card)) + rc = qeth_l3_deregister_addr_entry(card, addr);
hash_del(&addr->hnode); kfree(addr); @@ -403,11 +399,7 @@ static void qeth_l3_recover_ip(struct qe spin_lock_bh(&card->ip_lock);
hash_for_each_safe(card->ip_htable, i, tmp, addr, hnode) { - if (addr->disp_flag == QETH_DISP_ADDR_DELETE) { - qeth_l3_deregister_addr_entry(card, addr); - hash_del(&addr->hnode); - kfree(addr); - } else if (addr->disp_flag == QETH_DISP_ADDR_ADD) { + if (addr->disp_flag == QETH_DISP_ADDR_ADD) { if (addr->proto == QETH_PROT_IPV4) { addr->in_progress = 1; spin_unlock_bh(&card->ip_lock);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
[ Upstream commit 14d066c3531a87f727968cacd85bd95c75f59843 ]
Registering an IPv4 address with the HW takes quite a while, so we temporarily drop the ip_htable lock. Any concurrent add/remove of the same IP adjusts the IP's use count, and (on remove) is then blocked by addr->in_progress. After the register call has completed, we check the use count for concurrently attempted add/remove calls - and possibly straight-away deregister the IP again. This happens via l3_delete_ip(), which 1) looks up the queried IP in the htable (getting a reference to the *same* queried object), 2) deregisters the IP from the HW, and 3) frees the IP object.
The caller in l3_add_ip() then does a second free on the same object.
For this case, skip all the extra checks and lookups in l3_delete_ip() and just deregister & free the IP object ourselves.
Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_l3_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -319,7 +319,8 @@ int qeth_l3_add_ip(struct qeth_card *car (rc == IPA_RC_LAN_OFFLINE)) { addr->disp_flag = QETH_DISP_ADDR_DO_NOTHING; if (addr->ref_counter < 1) { - qeth_l3_delete_ip(card, addr); + qeth_l3_deregister_addr_entry(card, addr); + hash_del(&addr->hnode); kfree(addr); } } else {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
[ Upstream commit 4964c66fd49b2e2342da35358f2ff74614bcbaee ]
This reverts commit cb816192d986f7596009dedcf2201fe2e5bc2aa7.
The issue this attempted to fix never actually occurs. l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address was previously added to the card. If so, it returns -EEXIST and doesn't call l3_add_ip(). As a result, the "address exists" path in l3_add_ip() is never taken for rxip addresses, and this patch had no effect.
Fixes: cb816192d986 ("s390/qeth: fix using of ref counter for rxip addresses") Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_l3_main.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
--- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -249,8 +249,7 @@ int qeth_l3_delete_ip(struct qeth_card * return -ENOENT;
addr->ref_counter--; - if (addr->ref_counter > 0 && (addr->type == QETH_IP_TYPE_NORMAL || - addr->type == QETH_IP_TYPE_RXIP)) + if (addr->type == QETH_IP_TYPE_NORMAL && addr->ref_counter > 0) return rc; if (addr->in_progress) return -EINPROGRESS; @@ -328,9 +327,8 @@ int qeth_l3_add_ip(struct qeth_card *car kfree(addr); } } else { - if (addr->type == QETH_IP_TYPE_NORMAL || - addr->type == QETH_IP_TYPE_RXIP) - addr->ref_counter++; + if (addr->type == QETH_IP_TYPE_NORMAL) + addr->ref_counter++; }
return rc;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
[ Upstream commit c5c48c58b259bb8f0482398370ee539d7a12df3e ]
Current code ("qeth_l3_ip_from_hash()") matches a queried address object against objects in the IP table by IP address, Mask/Prefix Length and MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually require is either a) "is this IP address registered" (ie. match by IP address only), before adding a new address. b) or "is this address object registered" (ie. match all relevant attributes), before deleting an address.
Right now 1. the ADD path is too strict in its lookup, and eg. doesn't detect conflicts between an existing NORMAL address and a new VIPA address (because the NORMAL address will have mask != 0, while VIPA has a mask == 0), 2. the DELETE path is not strict enough, and eg. allows del_rxip() to delete a VIPA address as long as the IP address matches.
Fix all this by adding helpers (_addr_match_ip() and _addr_match_all()) that do the appropriate checking.
Note that the ADD path for NORMAL addresses is special, as qeth keeps track of how many times such an address is in use (and there is no immediate way of returning errors to the caller). So when a requested NORMAL address _fully_ matches an existing one, it's not considered a conflict and we merely increment the refcount.
Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_l3.h | 34 ++++++++++++++ drivers/s390/net/qeth_l3_main.c | 91 ++++++++++++++++++---------------------- 2 files changed, 74 insertions(+), 51 deletions(-)
--- a/drivers/s390/net/qeth_l3.h +++ b/drivers/s390/net/qeth_l3.h @@ -40,8 +40,40 @@ struct qeth_ipaddr { unsigned int pfxlen; } a6; } u; - }; + +static inline bool qeth_l3_addr_match_ip(struct qeth_ipaddr *a1, + struct qeth_ipaddr *a2) +{ + if (a1->proto != a2->proto) + return false; + if (a1->proto == QETH_PROT_IPV6) + return ipv6_addr_equal(&a1->u.a6.addr, &a2->u.a6.addr); + return a1->u.a4.addr == a2->u.a4.addr; +} + +static inline bool qeth_l3_addr_match_all(struct qeth_ipaddr *a1, + struct qeth_ipaddr *a2) +{ + /* Assumes that the pair was obtained via qeth_l3_addr_find_by_ip(), + * so 'proto' and 'addr' match for sure. + * + * For ucast: + * - 'mac' is always 0. + * - 'mask'/'pfxlen' for RXIP/VIPA is always 0. For NORMAL, matching + * values are required to avoid mixups in takeover eligibility. + * + * For mcast, + * - 'mac' is mapped from the IP, and thus always matches. + * - 'mask'/'pfxlen' is always 0. + */ + if (a1->type != a2->type) + return false; + if (a1->proto == QETH_PROT_IPV6) + return a1->u.a6.pfxlen == a2->u.a6.pfxlen; + return a1->u.a4.mask == a2->u.a4.mask; +} + static inline u64 qeth_l3_ipaddr_hash(struct qeth_ipaddr *addr) { u64 ret = 0; --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -149,6 +149,24 @@ int qeth_l3_string_to_ipaddr(const char return -EINVAL; }
+static struct qeth_ipaddr *qeth_l3_find_addr_by_ip(struct qeth_card *card, + struct qeth_ipaddr *query) +{ + u64 key = qeth_l3_ipaddr_hash(query); + struct qeth_ipaddr *addr; + + if (query->is_multicast) { + hash_for_each_possible(card->ip_mc_htable, addr, hnode, key) + if (qeth_l3_addr_match_ip(addr, query)) + return addr; + } else { + hash_for_each_possible(card->ip_htable, addr, hnode, key) + if (qeth_l3_addr_match_ip(addr, query)) + return addr; + } + return NULL; +} + static void qeth_l3_convert_addr_to_bits(u8 *addr, u8 *bits, int len) { int i, j; @@ -202,34 +220,6 @@ static bool qeth_l3_is_addr_covered_by_i return rc; }
-inline int -qeth_l3_ipaddrs_is_equal(struct qeth_ipaddr *addr1, struct qeth_ipaddr *addr2) -{ - return addr1->proto == addr2->proto && - !memcmp(&addr1->u, &addr2->u, sizeof(addr1->u)) && - !memcmp(&addr1->mac, &addr2->mac, sizeof(addr1->mac)); -} - -static struct qeth_ipaddr * -qeth_l3_ip_from_hash(struct qeth_card *card, struct qeth_ipaddr *tmp_addr) -{ - struct qeth_ipaddr *addr; - - if (tmp_addr->is_multicast) { - hash_for_each_possible(card->ip_mc_htable, addr, - hnode, qeth_l3_ipaddr_hash(tmp_addr)) - if (qeth_l3_ipaddrs_is_equal(tmp_addr, addr)) - return addr; - } else { - hash_for_each_possible(card->ip_htable, addr, - hnode, qeth_l3_ipaddr_hash(tmp_addr)) - if (qeth_l3_ipaddrs_is_equal(tmp_addr, addr)) - return addr; - } - - return NULL; -} - int qeth_l3_delete_ip(struct qeth_card *card, struct qeth_ipaddr *tmp_addr) { int rc = 0; @@ -244,8 +234,8 @@ int qeth_l3_delete_ip(struct qeth_card * QETH_CARD_HEX(card, 4, ((char *)&tmp_addr->u.a6.addr) + 8, 8); }
- addr = qeth_l3_ip_from_hash(card, tmp_addr); - if (!addr) + addr = qeth_l3_find_addr_by_ip(card, tmp_addr); + if (!addr || !qeth_l3_addr_match_all(addr, tmp_addr)) return -ENOENT;
addr->ref_counter--; @@ -267,6 +257,7 @@ int qeth_l3_add_ip(struct qeth_card *car { int rc = 0; struct qeth_ipaddr *addr; + char buf[40];
QETH_CARD_TEXT(card, 4, "addip");
@@ -277,8 +268,20 @@ int qeth_l3_add_ip(struct qeth_card *car QETH_CARD_HEX(card, 4, ((char *)&tmp_addr->u.a6.addr) + 8, 8); }
- addr = qeth_l3_ip_from_hash(card, tmp_addr); - if (!addr) { + addr = qeth_l3_find_addr_by_ip(card, tmp_addr); + if (addr) { + if (tmp_addr->type != QETH_IP_TYPE_NORMAL) + return -EADDRINUSE; + if (qeth_l3_addr_match_all(addr, tmp_addr)) { + addr->ref_counter++; + return 0; + } + qeth_l3_ipaddr_to_string(tmp_addr->proto, (u8 *)&tmp_addr->u, + buf); + dev_warn(&card->gdev->dev, + "Registering IP address %s failed\n", buf); + return -EADDRINUSE; + } else { addr = qeth_l3_get_addr_buffer(tmp_addr->proto); if (!addr) return -ENOMEM; @@ -326,11 +329,7 @@ int qeth_l3_add_ip(struct qeth_card *car hash_del(&addr->hnode); kfree(addr); } - } else { - if (addr->type == QETH_IP_TYPE_NORMAL) - addr->ref_counter++; } - return rc; }
@@ -714,12 +713,7 @@ int qeth_l3_add_vipa(struct qeth_card *c return -ENOMEM;
spin_lock_bh(&card->ip_lock); - - if (qeth_l3_ip_from_hash(card, ipaddr)) - rc = -EEXIST; - else - qeth_l3_add_ip(card, ipaddr); - + rc = qeth_l3_add_ip(card, ipaddr); spin_unlock_bh(&card->ip_lock);
kfree(ipaddr); @@ -782,12 +776,7 @@ int qeth_l3_add_rxip(struct qeth_card *c return -ENOMEM;
spin_lock_bh(&card->ip_lock); - - if (qeth_l3_ip_from_hash(card, ipaddr)) - rc = -EEXIST; - else - qeth_l3_add_ip(card, ipaddr); - + rc = qeth_l3_add_ip(card, ipaddr); spin_unlock_bh(&card->ip_lock);
kfree(ipaddr); @@ -1395,8 +1384,9 @@ qeth_l3_add_mc_to_hash(struct qeth_card memcpy(tmp->mac, buf, sizeof(tmp->mac)); tmp->is_multicast = 1;
- ipm = qeth_l3_ip_from_hash(card, tmp); + ipm = qeth_l3_find_addr_by_ip(card, tmp); if (ipm) { + /* for mcast, by-IP match means full match */ ipm->disp_flag = QETH_DISP_ADDR_DO_NOTHING; } else { ipm = qeth_l3_get_addr_buffer(QETH_PROT_IPV4); @@ -1479,8 +1469,9 @@ qeth_l3_add_mc6_to_hash(struct qeth_card sizeof(struct in6_addr)); tmp->is_multicast = 1;
- ipm = qeth_l3_ip_from_hash(card, tmp); + ipm = qeth_l3_find_addr_by_ip(card, tmp); if (ipm) { + /* for mcast, by-IP match means full match */ ipm->disp_flag = QETH_DISP_ADDR_DO_NOTHING; continue; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
[ Upstream commit d22ffb5a712f9211ffd104c38fc17cbfb1b5e2b0 ]
If multiple IPA commands are build & sent out concurrently, fill_ipacmd_header() may assign a seqno value to a command that's different from what send_control_data() later assigns to this command's reply. This is due to other commands passing through send_control_data(), and incrementing card->seqno.ipa along the way.
So one IPA command has no reply that's waiting for its seqno, while some other IPA command has multiple reply objects waiting for it. Only one of those waiting replies wins, and the other(s) times out and triggers a recovery via send_ipa_cmd().
Fix this by making sure that the same seqno value is assigned to a command and its reply object. Do so immediately before submitting the command & while holding the irq_pending "lock", to produce nicely ascending seqnos.
As a side effect, *all* IPA commands now use a reply object that's waiting for its actual seqno. Previously, early IPA commands that were submitted while the card was still DOWN used the "catch-all" IDX seqno.
Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_core_main.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
--- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -2087,25 +2087,26 @@ int qeth_send_control_data(struct qeth_c } reply->callback = reply_cb; reply->param = reply_param; - if (card->state == CARD_STATE_DOWN) - reply->seqno = QETH_IDX_COMMAND_SEQNO; - else - reply->seqno = card->seqno.ipa++; + init_waitqueue_head(&reply->wait_q); - spin_lock_irqsave(&card->lock, flags); - list_add_tail(&reply->list, &card->cmd_waiter_list); - spin_unlock_irqrestore(&card->lock, flags); QETH_DBF_HEX(CTRL, 2, iob->data, QETH_DBF_CTRL_LEN);
while (atomic_cmpxchg(&card->write.irq_pending, 0, 1)) ; - qeth_prepare_control_data(card, len, iob);
if (IS_IPA(iob->data)) { cmd = __ipa_cmd(iob); + cmd->hdr.seqno = card->seqno.ipa++; + reply->seqno = cmd->hdr.seqno; event_timeout = QETH_IPA_TIMEOUT; } else { + reply->seqno = QETH_IDX_COMMAND_SEQNO; event_timeout = QETH_TIMEOUT; } + qeth_prepare_control_data(card, len, iob); + + spin_lock_irqsave(&card->lock, flags); + list_add_tail(&reply->list, &card->cmd_waiter_list); + spin_unlock_irqrestore(&card->lock, flags);
timeout = jiffies + event_timeout;
@@ -2896,7 +2897,7 @@ static void qeth_fill_ipacmd_header(stru memset(cmd, 0, sizeof(struct qeth_ipa_cmd)); cmd->hdr.command = command; cmd->hdr.initiator = IPA_CMD_INITIATOR_HOST; - cmd->hdr.seqno = card->seqno.ipa; + /* cmd->hdr.seqno is set by qeth_send_control_data() */ cmd->hdr.adapter_type = qeth_get_ipa_adp_type(card->info.link_type); cmd->hdr.rel_adapter_no = (__u8) card->info.portno; if (card->options.layer2)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ming Lei ming.lei@redhat.com
commit 105976f517791aed3b11f8f53b308a2069d42055 upstream.
__blk_mq_requeue_request() covers two cases:
- one is that the requeued request is added to hctx->dispatch, such as blk_mq_dispatch_rq_list()
- another case is that the request is requeued to io scheduler, such as blk_mq_requeue_request().
We should call io sched's .requeue_request callback only for the 2nd case.
Cc: Paolo Valente paolo.valente@linaro.org Cc: Omar Sandoval osandov@fb.com Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche bart.vanassche@wdc.com Acked-by: Paolo Valente paolo.valente@linaro.org Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-mq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -638,7 +638,6 @@ static void __blk_mq_requeue_request(str
trace_block_rq_requeue(q, rq); wbt_requeue(q->rq_wb, &rq->issue_stat); - blk_mq_sched_requeue_request(rq);
if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) { if (q->dma_drain_size && blk_rq_bytes(rq)) @@ -650,6 +649,9 @@ void blk_mq_requeue_request(struct reque { __blk_mq_requeue_request(rq);
+ /* this request will be re-inserted to io scheduler queue */ + blk_mq_sched_requeue_request(rq); + BUG_ON(blk_queued_rq(rq)); blk_mq_add_to_requeue_list(rq, true, kick_requeue_list); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mauro Carvalho Chehab mchehab@s-opensource.com
commit b9c97c67fd19262c002d94ced2bfb513083e161e upstream.
If m88d3103 chip ID is not recognized, the device is not initialized.
However, it returns from probe without any error, causing this OOPS:
[ 7.689289] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 7.689297] pgd = 7b0bd7a7 [ 7.689302] [00000000] *pgd=00000000 [ 7.689318] Internal error: Oops: 80000005 [#1] SMP ARM [ 7.689322] Modules linked in: dvb_usb_dvbsky(+) m88ds3103 dvb_usb_v2 dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_core crc32_arm_ce videodev media [ 7.689358] CPU: 3 PID: 197 Comm: systemd-udevd Not tainted 4.15.0-mcc+ #23 [ 7.689361] Hardware name: BCM2835 [ 7.689367] PC is at 0x0 [ 7.689382] LR is at m88ds3103_attach+0x194/0x1d0 [m88ds3103] [ 7.689386] pc : [<00000000>] lr : [<bf0ae1ec>] psr: 60000013 [ 7.689391] sp : ed8e5c20 ip : ed8c1e00 fp : ed8945c0 [ 7.689395] r10: ed894000 r9 : ed894378 r8 : eda736c0 [ 7.689400] r7 : ed894070 r6 : ed8e5c44 r5 : bf0bb040 r4 : eda77600 [ 7.689405] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : eda77600 [ 7.689412] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 7.689417] Control: 10c5383d Table: 2d8e806a DAC: 00000051 [ 7.689423] Process systemd-udevd (pid: 197, stack limit = 0xe9dbfb63) [ 7.689428] Stack: (0xed8e5c20 to 0xed8e6000) [ 7.689439] 5c20: ed853a80 eda73640 ed894000 ed8942c0 ed853a80 bf0b9e98 ed894070 bf0b9f10 [ 7.689449] 5c40: 00000000 00000000 bf08c17c c08dfc50 00000000 00000000 00000000 00000000 [ 7.689459] 5c60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 7.689468] 5c80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 7.689479] 5ca0: 00000000 00000000 ed8945c0 ed8942c0 ed894000 ed894830 bf0b9e98 00000000 [ 7.689490] 5cc0: ed894378 bf0a3cb4 bf0bc3b0 0000533b ed920540 00000000 00000034 bf0a6434 [ 7.689500] 5ce0: ee952070 ed826600 bf0a7038 bf0a2dd8 00000001 bf0a6768 bf0a2f90 ed8943c0 [ 7.689511] 5d00: 00000000 c08eca68 ed826620 ed826620 00000000 ee952070 bf0bc034 ee952000 [ 7.689521] 5d20: ed826600 bf0bb080 ffffffed c0aa9e9c c0aa9dac ed826620 c16edf6c c168c2c8 [ 7.689531] 5d40: c16edf70 00000000 bf0bc034 0000000d 00000000 c08e268c bf0bb080 ed826600 [ 7.689541] 5d60: bf0bc034 ed826654 ed826620 bf0bc034 c164c8bc 00000000 00000001 00000000 [ 7.689553] 5d80: 00000028 c08e2948 00000000 bf0bc034 c08e2848 c08e0778 ee9f0a58 ed88bab4 [ 7.689563] 5da0: bf0bc034 ed90ba80 c168c1f0 c08e1934 bf0bb3bc c17045ac bf0bc034 c164c8bc [ 7.689574] 5dc0: bf0bc034 bf0bb3bc ed91f564 c08e34ec bf0bc000 c164c8bc bf0bc034 c0aa8dc4 [ 7.689584] 5de0: ffffe000 00000000 bf0bf000 ed91f600 ed91f564 c03021e4 00000001 00000000 [ 7.689595] 5e00: c166e040 8040003f ed853a80 bf0bc448 00000000 c1678174 ed853a80 f0f22000 [ 7.689605] 5e20: f0f21fff 8040003f 014000c0 ed91e700 ed91e700 c16d8e68 00000001 ed91e6c0 [ 7.689615] 5e40: bf0bc400 00000001 bf0bc400 ed91f564 00000001 00000000 00000028 c03c9a24 [ 7.689625] 5e60: 00000001 c03c8c94 ed8e5f50 ed8e5f50 00000001 bf0bc400 ed91f540 c03c8cb0 [ 7.689637] 5e80: bf0bc40c 00007fff bf0bc400 c03c60b0 00000000 bf0bc448 00000028 c0e09684 [ 7.689647] 5ea0: 00000002 bf0bc530 c1234bf8 bf0bc5dc bf0bc514 c10ebbe8 ffffe000 bf000000 [ 7.689657] 5ec0: 00011538 00000000 ed8e5f48 00000000 00000000 00000000 00000000 00000000 [ 7.689666] 5ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 7.689676] 5f00: 00000000 00000000 7fffffff 00000000 00000013 b6e55a18 0000017b c0309104 [ 7.689686] 5f20: ed8e4000 00000000 00510af0 c03c9430 7fffffff 00000000 00000003 00000000 [ 7.689697] 5f40: 00000000 f0f0f000 00011538 00000000 f0f107b0 f0f0f000 00011538 f0f1fdb8 [ 7.689707] 5f60: f0f1fbe8 f0f1b974 00004000 000041e0 bf0bc3d0 00000001 00000000 000024c4 [ 7.689717] 5f80: 0000002d 0000002e 00000019 00000000 00000010 00000000 16894000 00000000 [ 7.689727] 5fa0: 00000000 c0308f20 16894000 00000000 00000013 b6e55a18 00000000 b6e5652c [ 7.689737] 5fc0: 16894000 00000000 00000000 0000017b 00020000 00508110 00000000 00510af0 [ 7.689748] 5fe0: bef68948 bef68938 b6e4d3d0 b6d32590 60000010 00000013 00000000 00000000 [ 7.689790] [<bf0ae1ec>] (m88ds3103_attach [m88ds3103]) from [<bf0b9f10>] (dvbsky_s960c_attach+0x78/0x280 [dvb_usb_dvbsky]) [ 7.689821] [<bf0b9f10>] (dvbsky_s960c_attach [dvb_usb_dvbsky]) from [<bf0a3cb4>] (dvb_usbv2_probe+0xa3c/0x1024 [dvb_usb_v2]) [ 7.689849] [<bf0a3cb4>] (dvb_usbv2_probe [dvb_usb_v2]) from [<c0aa9e9c>] (usb_probe_interface+0xf0/0x2a8) [ 7.689869] [<c0aa9e9c>] (usb_probe_interface) from [<c08e268c>] (driver_probe_device+0x2f8/0x4b4) [ 7.689881] [<c08e268c>] (driver_probe_device) from [<c08e2948>] (__driver_attach+0x100/0x11c) [ 7.689895] [<c08e2948>] (__driver_attach) from [<c08e0778>] (bus_for_each_dev+0x4c/0x9c) [ 7.689909] [<c08e0778>] (bus_for_each_dev) from [<c08e1934>] (bus_add_driver+0x1c0/0x264) [ 7.689919] [<c08e1934>] (bus_add_driver) from [<c08e34ec>] (driver_register+0x78/0xf4) [ 7.689931] [<c08e34ec>] (driver_register) from [<c0aa8dc4>] (usb_register_driver+0x70/0x134) [ 7.689946] [<c0aa8dc4>] (usb_register_driver) from [<c03021e4>] (do_one_initcall+0x44/0x168) [ 7.689963] [<c03021e4>] (do_one_initcall) from [<c03c9a24>] (do_init_module+0x64/0x1f4) [ 7.689979] [<c03c9a24>] (do_init_module) from [<c03c8cb0>] (load_module+0x20a0/0x25c8) [ 7.689993] [<c03c8cb0>] (load_module) from [<c03c9430>] (SyS_finit_module+0xb4/0xec) [ 7.690007] [<c03c9430>] (SyS_finit_module) from [<c0308f20>] (ret_fast_syscall+0x0/0x54) [ 7.690018] Code: bad PC value
This may happen on normal circumstances, if, for some reason, the demod hangs and start returning an invalid chip ID:
[ 10.394395] m88ds3103 3-0068: Unknown device. Chip_id=00
So, change the logic to cause probe to fail with -ENODEV, preventing the OOPS.
Detected while testing DVB MMAP patches on Raspberry Pi 3 with DVBSky S960CI.
Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab mchehab@s-opensource.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/dvb-frontends/m88ds3103.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/media/dvb-frontends/m88ds3103.c +++ b/drivers/media/dvb-frontends/m88ds3103.c @@ -1262,11 +1262,12 @@ static int m88ds3103_select(struct i2c_m * New users must use I2C client binding directly! */ struct dvb_frontend *m88ds3103_attach(const struct m88ds3103_config *cfg, - struct i2c_adapter *i2c, struct i2c_adapter **tuner_i2c_adapter) + struct i2c_adapter *i2c, + struct i2c_adapter **tuner_i2c_adapter) { struct i2c_client *client; struct i2c_board_info board_info; - struct m88ds3103_platform_data pdata; + struct m88ds3103_platform_data pdata = {};
pdata.clk = cfg->clock; pdata.i2c_wr_max = cfg->i2c_wr_max; @@ -1409,6 +1410,8 @@ static int m88ds3103_probe(struct i2c_cl case M88DS3103_CHIP_ID: break; default: + ret = -ENODEV; + dev_err(&client->dev, "Unknown device. Chip_id=%02x\n", dev->chip_id); goto err_kfree; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anna Karbownik anna.karbownik@intel.com
commit bf8486709ac7fad99e4040dea73fe466c57a4ae1 upstream.
Commit
3286d3eb906c ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights Landing which supports up to 6 channels.
This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm variables which don't pay critical role on KNL code path, so the memory corruption wasn't causing any visible driver failures.
The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that.
An alternative solution would be to restructure the KNL part of the driver to 2MC/3channel representation.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Anna Karbownik anna.karbownik@intel.com Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: Tony Luck tony.luck@intel.com Cc: jim.m.snow@intel.com Cc: krzysztof.paliswiat@intel.com Cc: lukasz.odzioba@intel.com Cc: qiuxu.zhuo@intel.com Cc: linux-edac linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 3286d3eb906c ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4") Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@int... [ Massage commit message. ] Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/edac/sb_edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/edac/sb_edac.c +++ b/drivers/edac/sb_edac.c @@ -279,7 +279,7 @@ static const u32 correrrthrsld[] = { * sbridge structs */
-#define NUM_CHANNELS 4 /* Max channels per MC */ +#define NUM_CHANNELS 6 /* Max channels per MC */ #define MAX_DIMMS 3 /* Max DIMMS per channel */ #define KNL_MAX_CHAS 38 /* KNL max num. of Cache Home Agents */ #define KNL_MAX_CHANNELS 6 /* KNL max num. of PCI channels */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 5fe01793dd953ab947fababe8abaf5ed5258c8df upstream.
Missed when enabling the Multiple-epoch facility. If the facility is installed and the control is set, a sign based comaprison has to be performed.
Right now we would inject wrong interrupts and ignore interrupt conditions. Also the sleep time is calculated in a wrong way.
Signed-off-by: David Hildenbrand david@redhat.com Message-Id: 20180207114647.6220-2-david@redhat.com Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Cc: stable@vger.kernel.org Reviewed-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kvm/interrupt.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-)
--- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -173,8 +173,15 @@ static int ckc_interrupts_enabled(struct
static int ckc_irq_pending(struct kvm_vcpu *vcpu) { - if (vcpu->arch.sie_block->ckc >= kvm_s390_get_tod_clock_fast(vcpu->kvm)) + const u64 now = kvm_s390_get_tod_clock_fast(vcpu->kvm); + const u64 ckc = vcpu->arch.sie_block->ckc; + + if (vcpu->arch.sie_block->gcr[0] & 0x0020000000000000ul) { + if ((s64)ckc >= (s64)now) + return 0; + } else if (ckc >= now) { return 0; + } return ckc_interrupts_enabled(vcpu); }
@@ -1004,13 +1011,19 @@ int kvm_cpu_has_pending_timer(struct kvm
static u64 __calculate_sltime(struct kvm_vcpu *vcpu) { - u64 now, cputm, sltime = 0; + const u64 now = kvm_s390_get_tod_clock_fast(vcpu->kvm); + const u64 ckc = vcpu->arch.sie_block->ckc; + u64 cputm, sltime = 0;
if (ckc_interrupts_enabled(vcpu)) { - now = kvm_s390_get_tod_clock_fast(vcpu->kvm); - sltime = tod_to_ns(vcpu->arch.sie_block->ckc - now); - /* already expired or overflow? */ - if (!sltime || vcpu->arch.sie_block->ckc <= now) + if (vcpu->arch.sie_block->gcr[0] & 0x0020000000000000ul) { + if ((s64)now < (s64)ckc) + sltime = tod_to_ns((s64)ckc - (s64)now); + } else if (now < ckc) { + sltime = tod_to_ns(ckc - now); + } + /* already expired */ + if (!sltime) return 0; if (cpu_timer_interrupts_enabled(vcpu)) { cputm = kvm_s390_get_cpu_timer(vcpu);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 0e7def5fb0dc53ddbb9f62a497d15f1e11ccdc36 upstream.
Right now, SET CLOCK called in the guest does not properly take care of the epoch index, as the call goes via the old kvm_s390_set_tod_clock() interface. So the epoch index is neither reset to 0, if required, nor properly set to e.g. 0xff on negative values.
Fix this by providing a single kvm_s390_set_tod_clock() function. Move Multiple-epoch facility handling into it.
Signed-off-by: David Hildenbrand david@redhat.com Message-Id: 20180207114647.6220-3-david@redhat.com Reviewed-by: Christian Borntraeger borntraeger@de.ibm.com Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kvm/kvm-s390.c | 46 +++++++++++++++------------------------------- arch/s390/kvm/kvm-s390.h | 5 ++--- arch/s390/kvm/priv.c | 9 +++++---- 3 files changed, 22 insertions(+), 38 deletions(-)
--- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -888,12 +888,9 @@ static int kvm_s390_set_tod_ext(struct k if (copy_from_user(>od, (void __user *)attr->addr, sizeof(gtod))) return -EFAULT;
- if (test_kvm_facility(kvm, 139)) - kvm_s390_set_tod_clock_ext(kvm, >od); - else if (gtod.epoch_idx == 0) - kvm_s390_set_tod_clock(kvm, gtod.tod); - else + if (!test_kvm_facility(kvm, 139) && gtod.epoch_idx) return -EINVAL; + kvm_s390_set_tod_clock(kvm, >od);
VM_EVENT(kvm, 3, "SET: TOD extension: 0x%x, TOD base: 0x%llx", gtod.epoch_idx, gtod.tod); @@ -918,13 +915,14 @@ static int kvm_s390_set_tod_high(struct
static int kvm_s390_set_tod_low(struct kvm *kvm, struct kvm_device_attr *attr) { - u64 gtod; + struct kvm_s390_vm_tod_clock gtod = { 0 };
- if (copy_from_user(>od, (void __user *)attr->addr, sizeof(gtod))) + if (copy_from_user(>od.tod, (void __user *)attr->addr, + sizeof(gtod.tod))) return -EFAULT;
- kvm_s390_set_tod_clock(kvm, gtod); - VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod); + kvm_s390_set_tod_clock(kvm, >od); + VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod.tod); return 0; }
@@ -2945,8 +2943,8 @@ retry: return 0; }
-void kvm_s390_set_tod_clock_ext(struct kvm *kvm, - const struct kvm_s390_vm_tod_clock *gtod) +void kvm_s390_set_tod_clock(struct kvm *kvm, + const struct kvm_s390_vm_tod_clock *gtod) { struct kvm_vcpu *vcpu; struct kvm_s390_tod_clock_ext htod; @@ -2958,10 +2956,12 @@ void kvm_s390_set_tod_clock_ext(struct k get_tod_clock_ext((char *)&htod);
kvm->arch.epoch = gtod->tod - htod.tod; - kvm->arch.epdx = gtod->epoch_idx - htod.epoch_idx; - - if (kvm->arch.epoch > gtod->tod) - kvm->arch.epdx -= 1; + kvm->arch.epdx = 0; + if (test_kvm_facility(kvm, 139)) { + kvm->arch.epdx = gtod->epoch_idx - htod.epoch_idx; + if (kvm->arch.epoch > gtod->tod) + kvm->arch.epdx -= 1; + }
kvm_s390_vcpu_block_all(kvm); kvm_for_each_vcpu(i, vcpu, kvm) { @@ -2972,22 +2972,6 @@ void kvm_s390_set_tod_clock_ext(struct k kvm_s390_vcpu_unblock_all(kvm); preempt_enable(); mutex_unlock(&kvm->lock); -} - -void kvm_s390_set_tod_clock(struct kvm *kvm, u64 tod) -{ - struct kvm_vcpu *vcpu; - int i; - - mutex_lock(&kvm->lock); - preempt_disable(); - kvm->arch.epoch = tod - get_tod_clock(); - kvm_s390_vcpu_block_all(kvm); - kvm_for_each_vcpu(i, vcpu, kvm) - vcpu->arch.sie_block->epoch = kvm->arch.epoch; - kvm_s390_vcpu_unblock_all(kvm); - preempt_enable(); - mutex_unlock(&kvm->lock); }
/** --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -272,9 +272,8 @@ int kvm_s390_handle_sigp_pei(struct kvm_ int handle_sthyi(struct kvm_vcpu *vcpu);
/* implemented in kvm-s390.c */ -void kvm_s390_set_tod_clock_ext(struct kvm *kvm, - const struct kvm_s390_vm_tod_clock *gtod); -void kvm_s390_set_tod_clock(struct kvm *kvm, u64 tod); +void kvm_s390_set_tod_clock(struct kvm *kvm, + const struct kvm_s390_vm_tod_clock *gtod); long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable); int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr); int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr); --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -84,9 +84,10 @@ int kvm_s390_handle_e3(struct kvm_vcpu * /* Handle SCK (SET CLOCK) interception */ static int handle_set_clock(struct kvm_vcpu *vcpu) { + struct kvm_s390_vm_tod_clock gtod = { 0 }; int rc; u8 ar; - u64 op2, val; + u64 op2;
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); @@ -94,12 +95,12 @@ static int handle_set_clock(struct kvm_v op2 = kvm_s390_get_base_disp_s(vcpu, &ar); if (op2 & 7) /* Operand must be on a doubleword boundary */ return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION); - rc = read_guest(vcpu, op2, ar, &val, sizeof(val)); + rc = read_guest(vcpu, op2, ar, >od.tod, sizeof(gtod.tod)); if (rc) return kvm_s390_inject_prog_cond(vcpu, rc);
- VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", val); - kvm_s390_set_tod_clock(vcpu->kvm, val); + VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", gtod.tod); + kvm_s390_set_tod_clock(vcpu->kvm, >od);
kvm_s390_set_psw_cc(vcpu, 0); return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit d16b52cb9cdb6f06dea8ab2f0a428e7d7f0b0a81 upstream.
We must copy both, the epoch and the epoch_idx.
Signed-off-by: David Hildenbrand david@redhat.com Message-Id: 20180207114647.6220-4-david@redhat.com Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Christian Borntraeger borntraeger@de.ibm.com Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kvm/kvm-s390.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2357,6 +2357,7 @@ void kvm_arch_vcpu_postcreate(struct kvm mutex_lock(&vcpu->kvm->lock); preempt_disable(); vcpu->arch.sie_block->epoch = vcpu->kvm->arch.epoch; + vcpu->arch.sie_block->epdx = vcpu->kvm->arch.epdx; preempt_enable(); mutex_unlock(&vcpu->kvm->lock); if (!kvm_is_ucontrol(vcpu->kvm)) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 1575767ef3cf5326701d2ae3075b7732cbc855e4 upstream.
For now, we don't take care of over/underflows. Especially underflows are critical:
Assume the epoch is currently 0 and we get a sync request for delta=1, meaning the TOD is moved forward by 1 and we have to fix it up by subtracting 1 from the epoch. Right now, this will leave the epoch index untouched, resulting in epoch=-1, epoch_idx=0, which is wrong.
We have to take care of over and underflows, also for the VSIE case. So let's factor out calculation into a separate function.
Signed-off-by: David Hildenbrand david@redhat.com Message-Id: 20180207114647.6220-5-david@redhat.com Reviewed-by: Christian Borntraeger borntraeger@de.ibm.com Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com [use u8 for idx] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kvm/kvm-s390.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)
--- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -169,6 +169,28 @@ int kvm_arch_hardware_enable(void) static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start, unsigned long end);
+static void kvm_clock_sync_scb(struct kvm_s390_sie_block *scb, u64 delta) +{ + u8 delta_idx = 0; + + /* + * The TOD jumps by delta, we have to compensate this by adding + * -delta to the epoch. + */ + delta = -delta; + + /* sign-extension - we're adding to signed values below */ + if ((s64)delta < 0) + delta_idx = -1; + + scb->epoch += delta; + if (scb->ecd & ECD_MEF) { + scb->epdx += delta_idx; + if (scb->epoch < delta) + scb->epdx += 1; + } +} + /* * This callback is executed during stop_machine(). All CPUs are therefore * temporarily stopped. In order not to change guest behavior, we have to @@ -184,13 +206,17 @@ static int kvm_clock_sync(struct notifie unsigned long long *delta = v;
list_for_each_entry(kvm, &vm_list, vm_list) { - kvm->arch.epoch -= *delta; kvm_for_each_vcpu(i, vcpu, kvm) { - vcpu->arch.sie_block->epoch -= *delta; + kvm_clock_sync_scb(vcpu->arch.sie_block, *delta); + if (i == 0) { + kvm->arch.epoch = vcpu->arch.sie_block->epoch; + kvm->arch.epdx = vcpu->arch.sie_block->epdx; + } if (vcpu->arch.cputm_enabled) vcpu->arch.cputm_start += *delta; if (vcpu->arch.vsie_block) - vcpu->arch.vsie_block->epoch -= *delta; + kvm_clock_sync_scb(vcpu->arch.vsie_block, + *delta); } } return NOTIFY_OK;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit b98c6a160a057d5686a8c54c79cc6c8c94a7d0c8 upstream.
The last expression in a statement expression need not be a bare variable, quoting gcc docs
The last thing in the compound statement should be an expression followed by a semicolon; the value of this subexpression serves as the value of the entire construct.
and we already use that in e.g. the min/max macros which end with a ternary expression.
This way, we can allow index to have const-qualified type, which will in some cases avoid the need for introducing a local copy of index of non-const qualified type. That, in turn, can prevent readers not familiar with the internals of array_index_nospec from wondering about the seemingly redundant extra variable, and I think that's worthwhile considering how confusing the whole _nospec business is.
The expression _i&_mask has type unsigned long (since that is the type of _mask, and the BUILD_BUG_ONs guarantee that _i will get promoted to that), so in order not to change the type of the whole expression, add a cast back to typeof(_i).
Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Signed-off-by: Dan Williams dan.j.williams@intel.com Acked-by: Linus Torvalds torvalds@linux-foundation.org Cc: Andy Lutomirski luto@kernel.org Cc: Arjan van de Ven arjan@linux.intel.com Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Woodhouse dwmw2@infradead.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Will Deacon will.deacon@arm.com Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/151881604837.17395.10812767547837568328.stgit@dwill... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/nospec.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/include/linux/nospec.h +++ b/include/linux/nospec.h @@ -72,7 +72,6 @@ static inline unsigned long array_index_ BUILD_BUG_ON(sizeof(_i) > sizeof(long)); \ BUILD_BUG_ON(sizeof(_s) > sizeof(long)); \ \ - _i &= _mask; \ - _i; \ + (typeof(_i)) (_i & _mask); \ }) #endif /* _LINUX_NOSPEC_H */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Beulich JBeulich@suse.com
commit 842cef9113c2120f74f645111ded1e020193d84c upstream.
Just like pte_{set,clear}_flags() their PMD and PUD counterparts should not do any address translation. This was outright wrong under Xen (causing a dead boot with no useful output on "suitable" systems), and produced needlessly more complicated code (even if just slightly) when paravirt was enabled.
Signed-off-by: Jan Beulich jbeulich@suse.com Reviewed-by: Juergen Gross jgross@suse.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/5A8AF1BB02000078001A91C3@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/pgtable.h | 8 ++++---- arch/x86/include/asm/pgtable_types.h | 10 ++++++++++ 2 files changed, 14 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -350,14 +350,14 @@ static inline pmd_t pmd_set_flags(pmd_t { pmdval_t v = native_pmd_val(pmd);
- return __pmd(v | set); + return native_make_pmd(v | set); }
static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) { pmdval_t v = native_pmd_val(pmd);
- return __pmd(v & ~clear); + return native_make_pmd(v & ~clear); }
static inline pmd_t pmd_mkold(pmd_t pmd) @@ -409,14 +409,14 @@ static inline pud_t pud_set_flags(pud_t { pudval_t v = native_pud_val(pud);
- return __pud(v | set); + return native_make_pud(v | set); }
static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) { pudval_t v = native_pud_val(pud);
- return __pud(v & ~clear); + return native_make_pud(v & ~clear); }
static inline pud_t pud_mkold(pud_t pud) --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -323,6 +323,11 @@ static inline pudval_t native_pud_val(pu #else #include <asm-generic/pgtable-nopud.h>
+static inline pud_t native_make_pud(pudval_t val) +{ + return (pud_t) { .p4d.pgd = native_make_pgd(val) }; +} + static inline pudval_t native_pud_val(pud_t pud) { return native_pgd_val(pud.p4d.pgd); @@ -344,6 +349,11 @@ static inline pmdval_t native_pmd_val(pm #else #include <asm-generic/pgtable-nopmd.h>
+static inline pmd_t native_make_pmd(pmdval_t val) +{ + return (pmd_t) { .pud.p4d.pgd = native_make_pgd(val) }; +} + static inline pmdval_t native_pmd_val(pmd_t pmd) { return native_pgd_val(pmd.pud.p4d.pgd);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
commit 8337d083507b9827dfb36d545538b7789df834fd upstream.
A section type mismatch warning shows up when building with LTO, since orion_ge00_mvmdio_bus_name was put in __initconst but not marked const itself:
include/linux/of.h: In function 'spear_setup_of_timer': arch/arm/mach-spear/time.c:207:34: error: 'timer_of_match' causes a section type conflict with 'orion_ge00_mvmdio_bus_name' static const struct of_device_id timer_of_match[] __initconst = { ^ arch/arm/plat-orion/common.c:475:32: note: 'orion_ge00_mvmdio_bus_name' was declared here static __initconst const char *orion_ge00_mvmdio_bus_name = "orion-mii"; ^
As pointed out by Andrew Lunn, it should in fact be 'const' but not '__initconst' because the string is never copied but may be accessed after the init sections are freed. To fix that, I get rid of the extra symbol and rewrite the initialization in a simpler way that assigns both the bus_id and modalias statically.
I spotted another theoretical bug in the same place, where d->netdev[i] may be an out of bounds access, this can be fixed by moving the device assignment into the loop.
Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/plat-orion/common.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-)
--- a/arch/arm/plat-orion/common.c +++ b/arch/arm/plat-orion/common.c @@ -472,28 +472,27 @@ void __init orion_ge11_init(struct mv643 /***************************************************************************** * Ethernet switch ****************************************************************************/ -static __initconst const char *orion_ge00_mvmdio_bus_name = "orion-mii"; -static __initdata struct mdio_board_info - orion_ge00_switch_board_info; +static __initdata struct mdio_board_info orion_ge00_switch_board_info = { + .bus_id = "orion-mii", + .modalias = "mv88e6085", +};
void __init orion_ge00_switch_init(struct dsa_chip_data *d) { - struct mdio_board_info *bd; unsigned int i;
if (!IS_BUILTIN(CONFIG_PHYLIB)) return;
- for (i = 0; i < ARRAY_SIZE(d->port_names); i++) - if (!strcmp(d->port_names[i], "cpu")) + for (i = 0; i < ARRAY_SIZE(d->port_names); i++) { + if (!strcmp(d->port_names[i], "cpu")) { + d->netdev[i] = &orion_ge00.dev; break; + } + }
- bd = &orion_ge00_switch_board_info; - bd->bus_id = orion_ge00_mvmdio_bus_name; - bd->mdio_addr = d->sw_addr; - d->netdev[i] = &orion_ge00.dev; - strcpy(bd->modalias, "mv88e6085"); - bd->platform_data = d; + orion_ge00_switch_board_info.mdio_addr = d->sw_addr; + orion_ge00_switch_board_info.platform_data = d;
mdiobus_register_board_info(&orion_ge00_switch_board_info, 1); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Schultz d.schultz@phytec.de
commit 5ce0bad4ccd04c8a989e94d3c89e4e796ac22e48 upstream.
Rockchip recommends to run the CPU cores only with operations points of 1.6 GHz or lower.
Removed the cpu0 node with too high operation points and use the default values instead.
Fixes: 903d31e34628 ("ARM: dts: rockchip: Add support for phyCORE-RK3288 SoM") Cc: stable@vger.kernel.org Signed-off-by: Daniel Schultz d.schultz@phytec.de Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/rk3288-phycore-som.dtsi | 20 -------------------- 1 file changed, 20 deletions(-)
--- a/arch/arm/boot/dts/rk3288-phycore-som.dtsi +++ b/arch/arm/boot/dts/rk3288-phycore-som.dtsi @@ -110,26 +110,6 @@ }; };
-&cpu0 { - cpu0-supply = <&vdd_cpu>; - operating-points = < - /* KHz uV */ - 1800000 1400000 - 1608000 1350000 - 1512000 1300000 - 1416000 1200000 - 1200000 1100000 - 1008000 1050000 - 816000 1000000 - 696000 950000 - 600000 900000 - 408000 900000 - 312000 900000 - 216000 900000 - 126000 900000 - >; -}; - &emmc { status = "okay"; bus-width = <8>;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ulf Magnusson ulfalizer@gmail.com
commit 8aa36a8dcde3183d84db7b0d622ffddcebb61077 upstream.
The MACH_ARMADA_375 and MACH_ARMADA_38X boards select ARM_ERRATA_753970, but it was renamed to PL310_ERRATA_753970 by commit fa0ce4035d48 ("ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds").
Fix the selects to use the new name.
Discovered with the https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.... script. Fixes: fa0ce4035d48 ("ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds" cc: stable@vger.kernel.org Signed-off-by: Ulf Magnusson ulfalizer@gmail.com Signed-off-by: Gregory CLEMENT gregory.clement@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/mach-mvebu/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/mach-mvebu/Kconfig +++ b/arch/arm/mach-mvebu/Kconfig @@ -42,7 +42,7 @@ config MACH_ARMADA_375 depends on ARCH_MULTI_V7 select ARMADA_370_XP_IRQ select ARM_ERRATA_720789 - select ARM_ERRATA_753970 + select PL310_ERRATA_753970 select ARM_GIC select ARMADA_375_CLK select HAVE_ARM_SCU @@ -58,7 +58,7 @@ config MACH_ARMADA_38X bool "Marvell Armada 380/385 boards" depends on ARCH_MULTI_V7 select ARM_ERRATA_720789 - select ARM_ERRATA_753970 + select PL310_ERRATA_753970 select ARM_GIC select ARM_GLOBAL_TIMER select CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
commit 67870eb1204223598ea6d8a4467b482e9f5875b5 upstream.
In banked-sr.c, we use a top-level '__asm__(".arch_extension virt")' statement to allow compilation of a multi-CPU kernel for ARMv6 and older ARMv7-A that don't normally support access to the banked registers.
This is considered to be a programming error by the gcc developers and will no longer work in gcc-8, where we now get a build error:
/tmp/cc4Qy7GR.s:34: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_usr' /tmp/cc4Qy7GR.s:41: Error: Banked registers are not available with this architecture. -- `mrs r3,ELR_hyp' /tmp/cc4Qy7GR.s:55: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_svc' /tmp/cc4Qy7GR.s:62: Error: Banked registers are not available with this architecture. -- `mrs r3,LR_svc' /tmp/cc4Qy7GR.s:69: Error: Banked registers are not available with this architecture. -- `mrs r3,SPSR_svc' /tmp/cc4Qy7GR.s:76: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_abt'
Passign the '-march-armv7ve' flag to gcc works, and is ok here, because we know the functions won't ever be called on pre-ARMv7VE machines. Unfortunately, older compiler versions (4.8 and earlier) do not understand that flag, so we still need to keep the asm around.
Backporting to stable kernels (4.6+) is needed to allow those to be built with future compilers as well.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84129 Fixes: 33280b4cd1dc ("ARM: KVM: Add banked registers save/restore") Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/kvm/hyp/Makefile | 5 +++++ arch/arm/kvm/hyp/banked-sr.c | 4 ++++ 2 files changed, 9 insertions(+)
--- a/arch/arm/kvm/hyp/Makefile +++ b/arch/arm/kvm/hyp/Makefile @@ -7,6 +7,8 @@ ccflags-y += -fno-stack-protector -DDISA
KVM=../../../../virt/kvm
+CFLAGS_ARMV7VE :=$(call cc-option, -march=armv7ve) + obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o @@ -15,7 +17,10 @@ obj-$(CONFIG_KVM_ARM_HOST) += tlb.o obj-$(CONFIG_KVM_ARM_HOST) += cp15-sr.o obj-$(CONFIG_KVM_ARM_HOST) += vfp.o obj-$(CONFIG_KVM_ARM_HOST) += banked-sr.o +CFLAGS_banked-sr.o += $(CFLAGS_ARMV7VE) + obj-$(CONFIG_KVM_ARM_HOST) += entry.o obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o obj-$(CONFIG_KVM_ARM_HOST) += switch.o +CFLAGS_switch.o += $(CFLAGS_ARMV7VE) obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o --- a/arch/arm/kvm/hyp/banked-sr.c +++ b/arch/arm/kvm/hyp/banked-sr.c @@ -20,6 +20,10 @@
#include <asm/kvm_hyp.h>
+/* + * gcc before 4.9 doesn't understand -march=armv7ve, so we have to + * trick the assembler. + */ __asm__(".arch_extension virt");
void __hyp_text __banked_save_state(struct kvm_cpu_context *ctxt)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Bonzini pbonzini@redhat.com
commit 9b8ebbdb74b5ad76b9dfd8b101af17839174b126 upstream.
The x86 MMU if full of code that returns 0 and 1 for retry/emulate. Use the existing RET_MMIO_PF_RETRY/RET_MMIO_PF_EMULATE enum, renaming it to drop the MMIO part.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Cc: Thomas Backlund tmb@mageia.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/mmu.c | 95 +++++++++++++++++++++------------------------ arch/x86/kvm/paging_tmpl.h | 18 ++++---- 2 files changed, 55 insertions(+), 58 deletions(-)
--- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -150,6 +150,20 @@ module_param(dbg, bool, 0644); /* make pte_list_desc fit well in cache line */ #define PTE_LIST_EXT 3
+/* + * Return values of handle_mmio_page_fault and mmu.page_fault: + * RET_PF_RETRY: let CPU fault again on the address. + * RET_PF_EMULATE: mmio page fault, emulate the instruction directly. + * + * For handle_mmio_page_fault only: + * RET_PF_INVALID: the spte is invalid, let the real page fault path update it. + */ +enum { + RET_PF_RETRY = 0, + RET_PF_EMULATE = 1, + RET_PF_INVALID = 2, +}; + struct pte_list_desc { u64 *sptes[PTE_LIST_EXT]; struct pte_list_desc *more; @@ -2794,13 +2808,13 @@ done: return ret; }
-static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, - int write_fault, int level, gfn_t gfn, kvm_pfn_t pfn, - bool speculative, bool host_writable) +static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, + int write_fault, int level, gfn_t gfn, kvm_pfn_t pfn, + bool speculative, bool host_writable) { int was_rmapped = 0; int rmap_count; - bool emulate = false; + int ret = RET_PF_RETRY;
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2830,12 +2844,12 @@ static bool mmu_set_spte(struct kvm_vcpu if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) - emulate = true; + ret = RET_PF_EMULATE; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); }
if (unlikely(is_mmio_spte(*sptep))) - emulate = true; + ret = RET_PF_EMULATE;
pgprintk("%s: setting spte %llx\n", __func__, *sptep); pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n", @@ -2855,7 +2869,7 @@ static bool mmu_set_spte(struct kvm_vcpu
kvm_release_pfn_clean(pfn);
- return emulate; + return ret; }
static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -2994,14 +3008,13 @@ static int kvm_handle_bad_page(struct kv * Do not cache the mmio info caused by writing the readonly gfn * into the spte otherwise read access on readonly gfn also can * caused mmio page fault and treat it as mmio access. - * Return 1 to tell kvm to emulate it. */ if (pfn == KVM_PFN_ERR_RO_FAULT) - return 1; + return RET_PF_EMULATE;
if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), current); - return 0; + return RET_PF_RETRY; }
return -EFAULT; @@ -3286,13 +3299,13 @@ static int nonpaging_map(struct kvm_vcpu }
if (fast_page_fault(vcpu, v, level, error_code)) - return 0; + return RET_PF_RETRY;
mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb();
if (try_async_pf(vcpu, prefault, gfn, v, &pfn, write, &map_writable)) - return 0; + return RET_PF_RETRY;
if (handle_abnormal_pfn(vcpu, v, gfn, pfn, ACC_ALL, &r)) return r; @@ -3312,7 +3325,7 @@ static int nonpaging_map(struct kvm_vcpu out_unlock: spin_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(pfn); - return 0; + return RET_PF_RETRY; }
@@ -3659,54 +3672,38 @@ exit: return reserved; }
-/* - * Return values of handle_mmio_page_fault: - * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction - * directly. - * RET_MMIO_PF_INVALID: invalid spte is detected then let the real page - * fault path update the mmio spte. - * RET_MMIO_PF_RETRY: let CPU fault again on the address. - * RET_MMIO_PF_BUG: a bug was detected (and a WARN was printed). - */ -enum { - RET_MMIO_PF_EMULATE = 1, - RET_MMIO_PF_INVALID = 2, - RET_MMIO_PF_RETRY = 0, - RET_MMIO_PF_BUG = -1 -}; - static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct) { u64 spte; bool reserved;
if (mmio_info_in_cache(vcpu, addr, direct)) - return RET_MMIO_PF_EMULATE; + return RET_PF_EMULATE;
reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, &spte); if (WARN_ON(reserved)) - return RET_MMIO_PF_BUG; + return -EINVAL;
if (is_mmio_spte(spte)) { gfn_t gfn = get_mmio_spte_gfn(spte); unsigned access = get_mmio_spte_access(spte);
if (!check_mmio_spte(vcpu, spte)) - return RET_MMIO_PF_INVALID; + return RET_PF_INVALID;
if (direct) addr = 0;
trace_handle_mmio_page_fault(addr, gfn, access); vcpu_cache_mmio_info(vcpu, addr, gfn, access); - return RET_MMIO_PF_EMULATE; + return RET_PF_EMULATE; }
/* * If the page table is zapped by other cpus, let CPU fault again on * the address. */ - return RET_MMIO_PF_RETRY; + return RET_PF_RETRY; } EXPORT_SYMBOL_GPL(handle_mmio_page_fault);
@@ -3756,7 +3753,7 @@ static int nonpaging_page_fault(struct k pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
if (page_fault_handle_page_track(vcpu, error_code, gfn)) - return 1; + return RET_PF_EMULATE;
r = mmu_topup_memory_caches(vcpu); if (r) @@ -3877,7 +3874,7 @@ static int tdp_page_fault(struct kvm_vcp MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
if (page_fault_handle_page_track(vcpu, error_code, gfn)) - return 1; + return RET_PF_EMULATE;
r = mmu_topup_memory_caches(vcpu); if (r) @@ -3894,13 +3891,13 @@ static int tdp_page_fault(struct kvm_vcp }
if (fast_page_fault(vcpu, gpa, level, error_code)) - return 0; + return RET_PF_RETRY;
mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb();
if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) - return 0; + return RET_PF_RETRY;
if (handle_abnormal_pfn(vcpu, 0, gfn, pfn, ACC_ALL, &r)) return r; @@ -3920,7 +3917,7 @@ static int tdp_page_fault(struct kvm_vcp out_unlock: spin_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(pfn); - return 0; + return RET_PF_RETRY; }
static void nonpaging_init_context(struct kvm_vcpu *vcpu, @@ -4919,25 +4916,25 @@ int kvm_mmu_page_fault(struct kvm_vcpu * vcpu->arch.gpa_val = cr2; }
+ r = RET_PF_INVALID; if (unlikely(error_code & PFERR_RSVD_MASK)) { r = handle_mmio_page_fault(vcpu, cr2, direct); - if (r == RET_MMIO_PF_EMULATE) { + if (r == RET_PF_EMULATE) { emulation_type = 0; goto emulate; } - if (r == RET_MMIO_PF_RETRY) - return 1; - if (r < 0) - return r; - /* Must be RET_MMIO_PF_INVALID. */ }
- r = vcpu->arch.mmu.page_fault(vcpu, cr2, lower_32_bits(error_code), - false); + if (r == RET_PF_INVALID) { + r = vcpu->arch.mmu.page_fault(vcpu, cr2, lower_32_bits(error_code), + false); + WARN_ON(r == RET_PF_INVALID); + } + + if (r == RET_PF_RETRY) + return 1; if (r < 0) return r; - if (!r) - return 1;
/* * Before emulating the instruction, check if the error code --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -593,7 +593,7 @@ static int FNAME(fetch)(struct kvm_vcpu struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; unsigned direct_access, access = gw->pt_access; - int top_level, emulate; + int top_level, ret;
direct_access = gw->pte_access;
@@ -659,15 +659,15 @@ static int FNAME(fetch)(struct kvm_vcpu }
clear_sp_write_flooding_count(it.sptep); - emulate = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, - it.level, gw->gfn, pfn, prefault, map_writable); + ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, + it.level, gw->gfn, pfn, prefault, map_writable); FNAME(pte_prefetch)(vcpu, gw, it.sptep);
- return emulate; + return ret;
out_gpte_changed: kvm_release_pfn_clean(pfn); - return 0; + return RET_PF_RETRY; }
/* @@ -762,12 +762,12 @@ static int FNAME(page_fault)(struct kvm_ if (!prefault) inject_page_fault(vcpu, &walker.fault);
- return 0; + return RET_PF_RETRY; }
if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) { shadow_page_table_clear_flood(vcpu, addr); - return 1; + return RET_PF_EMULATE; }
vcpu->arch.write_fault_to_shadow_pgtable = false; @@ -789,7 +789,7 @@ static int FNAME(page_fault)(struct kvm_
if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, write_fault, &map_writable)) - return 0; + return RET_PF_RETRY;
if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r)) return r; @@ -834,7 +834,7 @@ static int FNAME(page_fault)(struct kvm_ out_unlock: spin_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(pfn); - return 0; + return RET_PF_RETRY; }
static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Bonzini pbonzini@redhat.com
commit 0b2e9904c15963e715d33e5f3f1387f17d19333a upstream.
The initial reset of the local APIC is performed before the VMCS has been created, but it tries to do a vmwrite:
vmwrite error: reg 810 value 4a00 (err 18944) CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G W I 4.16.0-0.rc2.git0.1.fc28.x86_64 #1 Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014 Call Trace: vmx_set_rvi [kvm_intel] vmx_hwapic_irr_update [kvm_intel] kvm_lapic_reset [kvm] kvm_create_lapic [kvm] kvm_arch_vcpu_init [kvm] kvm_vcpu_init [kvm] vmx_create_vcpu [kvm_intel] kvm_vm_ioctl [kvm]
Move it later, after the VMCS has been created.
Fixes: 4191db26b714 ("KVM: x86: Update APICv on APIC reset") Cc: stable@vger.kernel.org Cc: Liran Alon liran.alon@oracle.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/lapic.c | 1 - arch/x86/kvm/x86.c | 1 + 2 files changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2107,7 +2107,6 @@ int kvm_create_lapic(struct kvm_vcpu *vc */ vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE; static_key_slow_inc(&apic_sw_disabled.key); /* sw disabled at reset */ - kvm_lapic_reset(vcpu, false); kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
return 0; --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7779,6 +7779,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu if (r) return r; kvm_vcpu_reset(vcpu, false); + kvm_lapic_reset(vcpu, false); kvm_mmu_setup(vcpu); vcpu_put(vcpu); return r;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ard.biesheuvel@linaro.org
commit ee8bdfb6568d86bb93f55f8d99c4c643e77304ee upstream.
Even though it is unconventional, some PCIe host implementations omit the root ports entirely, and simply consist of a host bridge (which is not modeled as a device in the PCI hierarchy) and a link.
When the downstream device is an endpoint, our current code does not seem to mind this unusual configuration. However, when PCIe switches are involved, the ASPM code assumes that any downstream switch port has a parent, and blindly dereferences the bus->parent->self field of the pci_dev struct to chain the downstream link state to the link state of the root port. Given that the root port is missing, the link is not modeled at all, and nor is the link state, and attempting to access it results in a NULL pointer dereference and a crash.
Avoid this by allowing the link state chain to terminate at the downstream port if no root port exists.
Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/pcie/aspm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -803,10 +803,14 @@ static struct pcie_link_state *alloc_pci
/* * Root Ports and PCI/PCI-X to PCIe Bridges are roots of PCIe - * hierarchies. + * hierarchies. Note that some PCIe host implementations omit + * the root ports entirely, in which case a downstream port on + * a switch may become the root of the link state chain for all + * its subordinate endpoints. */ if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT || - pci_pcie_type(pdev) == PCI_EXP_TYPE_PCIE_BRIDGE) { + pci_pcie_type(pdev) == PCI_EXP_TYPE_PCIE_BRIDGE || + !pdev->bus->parent->self) { link->root = link; } else { struct pcie_link_state *parent;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kai Heng Feng kai.heng.feng@canonical.com
commit 36904703aeeeb6cd31993f1353c8325006229f9a upstream.
The i2c touchpad on Dell XPS 9570 and Precision M5530 doesn't work out of box.
The touchpad relies on its _INI method to update its _HID value from XXXX0000 to SYNA2393.
Also, the _STA relies on value of I2CN to report correct status.
Set acpi_gbl_parse_table_as_term_list so the value of I2CN can be correctly set up, and _INI can get run. The ACPI table in this machine is designed to get parsed this way.
Also, change the quirk table to a more generic name.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198515 Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/bus.c | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-)
--- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -66,10 +66,37 @@ static int set_copy_dsdt(const struct dm return 0; } #endif +static int set_gbl_term_list(const struct dmi_system_id *id) +{ + acpi_gbl_parse_table_as_term_list = 1; + return 0; +}
-static const struct dmi_system_id dsdt_dmi_table[] __initconst = { +static const struct dmi_system_id acpi_quirks_dmi_table[] __initconst = { + /* + * Touchpad on Dell XPS 9570/Precision M5530 doesn't work under I2C + * mode. + * https://bugzilla.kernel.org/show_bug.cgi?id=198515 + */ + { + .callback = set_gbl_term_list, + .ident = "Dell Precision M5530", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Precision M5530"), + }, + }, + { + .callback = set_gbl_term_list, + .ident = "Dell XPS 15 9570", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "XPS 15 9570"), + }, + }, /* * Invoke DSDT corruption work-around on all Toshiba Satellite. + * DSDT will be copied to memory. * https://bugzilla.kernel.org/show_bug.cgi?id=14679 */ { @@ -83,7 +110,7 @@ static const struct dmi_system_id dsdt_d {} }; #else -static const struct dmi_system_id dsdt_dmi_table[] __initconst = { +static const struct dmi_system_id acpi_quirks_dmi_table[] __initconst = { {} }; #endif @@ -1001,11 +1028,8 @@ void __init acpi_early_init(void)
acpi_permanent_mmap = true;
- /* - * If the machine falls into the DMI check table, - * DSDT will be copied to memory - */ - dmi_check_system(dsdt_dmi_table); + /* Check machine-specific quirks */ + dmi_check_system(acpi_quirks_dmi_table);
status = acpi_reallocate_root_table(); if (ACPI_FAILURE(status)) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adam Ford aford173@gmail.com
commit 84c7efd607e7fb6933920322086db64654f669b2 upstream.
The pinmuxing was missing for I2C1 which was causing intermittent issues with the PMIC which is connected to I2C1. The bootloader did not quite configure the I2C1 either, so when running at 2.6MHz, it was generating errors at times.
This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz
Fixes: ab8dd3aed011 ("ARM: DTS: Add minimal Support for Logic PD DM3730 SOM-LV")
Signed-off-by: Adam Ford aford173@gmail.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/logicpd-som-lv.dtsi | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/arch/arm/boot/dts/logicpd-som-lv.dtsi +++ b/arch/arm/boot/dts/logicpd-som-lv.dtsi @@ -97,6 +97,8 @@ };
&i2c1 { + pinctrl-names = "default"; + pinctrl-0 = <&i2c1_pins>; clock-frequency = <2600000>;
twl: twl@48 { @@ -215,7 +217,12 @@ >; };
- + i2c1_pins: pinmux_i2c1_pins { + pinctrl-single,pins = < + OMAP3_CORE1_IOPAD(0x21ba, PIN_INPUT | MUX_MODE0) /* i2c1_scl.i2c1_scl */ + OMAP3_CORE1_IOPAD(0x21bc, PIN_INPUT | MUX_MODE0) /* i2c1_sda.i2c1_sda */ + >; + }; };
&omap3_pmx_wkup {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adam Ford aford173@gmail.com
commit 74402055a2d3ec998a1ded599e86185a27d9bbf4 upstream.
The pinmuxing was missing for I2C1 which was causing intermittent issues with the PMIC which is connected to I2C1. The bootloader did not quite configure the I2C1 either, so when running at 2.6MHz, it was generating errors at time.
This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz
Fixes: 687c27676151 ("ARM: dts: Add minimal support for LogicPD Torpedo DM3730 devkit")
Signed-off-by: Adam Ford aford173@gmail.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/logicpd-torpedo-som.dtsi | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/arch/arm/boot/dts/logicpd-torpedo-som.dtsi +++ b/arch/arm/boot/dts/logicpd-torpedo-som.dtsi @@ -104,6 +104,8 @@ };
&i2c1 { + pinctrl-names = "default"; + pinctrl-0 = <&i2c1_pins>; clock-frequency = <2600000>;
twl: twl@48 { @@ -211,6 +213,12 @@ OMAP3_CORE1_IOPAD(0x21b8, PIN_INPUT | MUX_MODE0) /* hsusb0_data7.hsusb0_data7 */ >; }; + i2c1_pins: pinmux_i2c1_pins { + pinctrl-single,pins = < + OMAP3_CORE1_IOPAD(0x21ba, PIN_INPUT | MUX_MODE0) /* i2c1_scl.i2c1_scl */ + OMAP3_CORE1_IOPAD(0x21bc, PIN_INPUT | MUX_MODE0) /* i2c1_sda.i2c1_sda */ + >; + }; };
&uart2 {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.com
commit 39772f0a7be3b3dc26c74ea13fe7847fd1522c8b upstream.
The locking protocols in md assume that a device will never be removed from an array during resync/recovery/reshape. When that isn't happening, rcu or reconfig_mutex is needed to protect an rdev pointer while taking a refcount. When it is happening, that protection isn't needed.
Unfortunately there are cases were remove_and_add_spares() is called when recovery might be happening: is state_store(), slot_store() and hot_remove_disk(). In each case, this is just an optimization, to try to expedite removal from the personality so the device can be removed from the array. If resync etc is happening, we just have to wait for md_check_recover to find a suitable time to call remove_and_add_spares().
This optimization and not essential so it doesn't matter if it fails. So change remove_and_add_spares() to abort early if resync/recovery/reshape is happening, unless it is called from md_check_recovery() as part of a newly started recovery. The parameter "this" is only NULL when called from md_check_recovery() so when it is NULL, there is no need to abort.
As this can result in a NULL dereference, the fix is suitable for -stable.
cc: yuyufen yuyufen@huawei.com Cc: Tomasz Majchrzak tomasz.majchrzak@intel.com Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.") Cc: stable@ver.kernel.org (v4.8+) Signed-off-by: NeilBrown neilb@suse.com Signed-off-by: Shaohua Li sh.li@alibaba-inc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/md.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8522,6 +8522,10 @@ static int remove_and_add_spares(struct int removed = 0; bool remove_some = false;
+ if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) + /* Mustn't remove devices when resync thread is running */ + return 0; + rdev_for_each(rdev, mddev) { if ((this == NULL || rdev == this) && rdev->raid_disk >= 0 &&
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sagi Grimberg sagi@grimberg.me
commit b4b591c87f2b0f4ebaf3a68d4f13873b241aa584 upstream.
The entire completions suppress mechanism is currently broken because the HCA might retry a send operation (due to dropped ack) after the nvme transaction has completed.
In order to handle this, we signal all send completions and introduce a separate done handler for async events as they will be handled differently (as they don't include in-capsule data by definition).
Signed-off-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Max Gurtovoy maxg@mellanox.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/nvme/host/rdma.c | 54 ++++++++++++----------------------------------- 1 file changed, 14 insertions(+), 40 deletions(-)
--- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -88,7 +88,6 @@ enum nvme_rdma_queue_flags {
struct nvme_rdma_queue { struct nvme_rdma_qe *rsp_ring; - atomic_t sig_count; int queue_size; size_t cmnd_capsule_len; struct nvme_rdma_ctrl *ctrl; @@ -521,7 +520,6 @@ static int nvme_rdma_alloc_queue(struct queue->cmnd_capsule_len = sizeof(struct nvme_command);
queue->queue_size = queue_size; - atomic_set(&queue->sig_count, 0);
queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler, queue, RDMA_PS_TCP, IB_QPT_RC); @@ -1232,21 +1230,9 @@ static void nvme_rdma_send_done(struct i nvme_end_request(rq, req->status, req->result); }
-/* - * We want to signal completion at least every queue depth/2. This returns the - * largest power of two that is not above half of (queue size + 1) to optimize - * (avoid divisions). - */ -static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue) -{ - int limit = 1 << ilog2((queue->queue_size + 1) / 2); - - return (atomic_inc_return(&queue->sig_count) & (limit - 1)) == 0; -} - static int nvme_rdma_post_send(struct nvme_rdma_queue *queue, struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge, - struct ib_send_wr *first, bool flush) + struct ib_send_wr *first) { struct ib_send_wr wr, *bad_wr; int ret; @@ -1255,31 +1241,12 @@ static int nvme_rdma_post_send(struct nv sge->length = sizeof(struct nvme_command), sge->lkey = queue->device->pd->local_dma_lkey;
- qe->cqe.done = nvme_rdma_send_done; - wr.next = NULL; wr.wr_cqe = &qe->cqe; wr.sg_list = sge; wr.num_sge = num_sge; wr.opcode = IB_WR_SEND; - wr.send_flags = 0; - - /* - * Unsignalled send completions are another giant desaster in the - * IB Verbs spec: If we don't regularly post signalled sends - * the send queue will fill up and only a QP reset will rescue us. - * Would have been way to obvious to handle this in hardware or - * at least the RDMA stack.. - * - * Always signal the flushes. The magic request used for the flush - * sequencer is not allocated in our driver's tagset and it's - * triggered to be freed by blk_cleanup_queue(). So we need to - * always mark it as signaled to ensure that the "wr_cqe", which is - * embedded in request's payload, is not freed when __ib_process_cq() - * calls wr_cqe->done(). - */ - if (nvme_rdma_queue_sig_limit(queue) || flush) - wr.send_flags |= IB_SEND_SIGNALED; + wr.send_flags = IB_SEND_SIGNALED;
if (first) first->next = ≀ @@ -1329,6 +1296,12 @@ static struct blk_mq_tags *nvme_rdma_tag return queue->ctrl->tag_set.tags[queue_idx - 1]; }
+static void nvme_rdma_async_done(struct ib_cq *cq, struct ib_wc *wc) +{ + if (unlikely(wc->status != IB_WC_SUCCESS)) + nvme_rdma_wr_error(cq, wc, "ASYNC"); +} + static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg, int aer_idx) { struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(arg); @@ -1350,10 +1323,12 @@ static void nvme_rdma_submit_async_event cmd->common.flags |= NVME_CMD_SGL_METABUF; nvme_rdma_set_sg_null(cmd);
+ sqe->cqe.done = nvme_rdma_async_done; + ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(*cmd), DMA_TO_DEVICE);
- ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL, false); + ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL); WARN_ON_ONCE(ret); }
@@ -1639,7 +1614,6 @@ static blk_status_t nvme_rdma_queue_rq(s struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); struct nvme_rdma_qe *sqe = &req->sqe; struct nvme_command *c = sqe->data; - bool flush = false; struct ib_device *dev; blk_status_t ret; int err; @@ -1668,13 +1642,13 @@ static blk_status_t nvme_rdma_queue_rq(s goto err; }
+ sqe->cqe.done = nvme_rdma_send_done; + ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(struct nvme_command), DMA_TO_DEVICE);
- if (req_op(rq) == REQ_OP_FLUSH) - flush = true; err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge, - req->mr->need_inval ? &req->reg_wr.wr : NULL, flush); + req->mr->need_inval ? &req->reg_wr.wr : NULL); if (unlikely(err)) { nvme_rdma_unmap_data(queue, rq); goto err;
On 03/07/2018 12:37 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.25 release. There are 110 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Mar 9 19:10:03 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.25-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 8 March 2018 at 01:07, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.25 release. There are 110 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Mar 9 19:10:03 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.25-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64 and arm.
x86_64 results not available due to infrastructure issues and we will share x86_64 results when available.
Summary ------------------------------------------------------------------------
kernel: 4.14.25-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 66060ac1dfa02f02646a55f6ed888c0f2001623e git describe: v4.14.24-111-g66060ac1dfa0 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.24-11...
No regressions (compared to build v4.14.24)
Boards, architectures and test suites: -------------------------------------
hi6220-hikey - arm64 * boot - pass: 20, * kselftest - pass: 48, skip: 17 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 61, skip: 2 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 10, skip: 4 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 999, skip: 151 * ltp-timers-tests - pass: 12, skip: 1
juno-r2 - arm64 * boot - pass: 20, * kselftest - pass: 48, skip: 17 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 61, skip: 2 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 10, skip: 4 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1001, skip: 149 * ltp-timers-tests - pass: 12, skip: 1
x15 - arm * boot - pass: 20, * kselftest - pass: 43, fail: 2, skip: 19 * libhugetlbfs - pass: 87, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 62, fail: 2, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 61, skip: 2 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 20, skip: 2 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1053, skip: 97 * ltp-timers-tests - pass: 12, skip: 1
-- Linaro QA (beta) https://qa-reports.linaro.org
On 8 March 2018 at 14:06, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On 8 March 2018 at 01:07, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.25 release. There are 110 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Mar 9 19:10:03 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.25-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64 and arm.
x86_64 results not available due to infrastructure issues and we will share x86_64 results when available.
No regressions on arm64, arm and x86_64. x86_64 and qemu_x86_64 results summary published in the bottom of this email.
Summary
kernel: 4.14.25-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 66060ac1dfa02f02646a55f6ed888c0f2001623e git describe: v4.14.24-111-g66060ac1dfa0 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.24-11...
No regressions (compared to build v4.14.24)
Boards, architectures and test suites:
hi6220-hikey - arm64
- boot - pass: 20,
- kselftest - pass: 48, skip: 17
- libhugetlbfs - pass: 90, skip: 1
- ltp-cap_bounds-tests - pass: 2,
- ltp-containers-tests - pass: 64, skip: 17
- ltp-fcntl-locktests-tests - pass: 2,
- ltp-filecaps-tests - pass: 2,
- ltp-fs-tests - pass: 61, skip: 2
- ltp-fs_bind-tests - pass: 2,
- ltp-fs_perms_simple-tests - pass: 19,
- ltp-fsx-tests - pass: 2,
- ltp-hugetlb-tests - pass: 21, skip: 1
- ltp-io-tests - pass: 3,
- ltp-ipc-tests - pass: 9,
- ltp-math-tests - pass: 11,
- ltp-nptl-tests - pass: 2,
- ltp-pty-tests - pass: 4,
- ltp-sched-tests - pass: 10, skip: 4
- ltp-securebits-tests - pass: 4,
- ltp-syscalls-tests - pass: 999, skip: 151
- ltp-timers-tests - pass: 12, skip: 1
juno-r2 - arm64
- boot - pass: 20,
- kselftest - pass: 48, skip: 17
- libhugetlbfs - pass: 90, skip: 1
- ltp-cap_bounds-tests - pass: 2,
- ltp-containers-tests - pass: 64, skip: 17
- ltp-fcntl-locktests-tests - pass: 2,
- ltp-filecaps-tests - pass: 2,
- ltp-fs-tests - pass: 61, skip: 2
- ltp-fs_bind-tests - pass: 2,
- ltp-fs_perms_simple-tests - pass: 19,
- ltp-fsx-tests - pass: 2,
- ltp-hugetlb-tests - pass: 22,
- ltp-io-tests - pass: 3,
- ltp-ipc-tests - pass: 9,
- ltp-math-tests - pass: 11,
- ltp-nptl-tests - pass: 2,
- ltp-pty-tests - pass: 4,
- ltp-sched-tests - pass: 10, skip: 4
- ltp-securebits-tests - pass: 4,
- ltp-syscalls-tests - pass: 1001, skip: 149
- ltp-timers-tests - pass: 12, skip: 1
x15 - arm
- boot - pass: 20,
- kselftest - pass: 43, fail: 2, skip: 19
- libhugetlbfs - pass: 87, skip: 1
- ltp-cap_bounds-tests - pass: 2,
- ltp-containers-tests - pass: 62, fail: 2, skip: 17
- ltp-fcntl-locktests-tests - pass: 2,
- ltp-filecaps-tests - pass: 2,
- ltp-fs-tests - pass: 61, skip: 2
- ltp-fs_bind-tests - pass: 2,
- ltp-fs_perms_simple-tests - pass: 19,
- ltp-fsx-tests - pass: 2,
- ltp-hugetlb-tests - pass: 20, skip: 2
- ltp-io-tests - pass: 3,
- ltp-ipc-tests - pass: 9,
- ltp-math-tests - pass: 11,
- ltp-nptl-tests - pass: 2,
- ltp-pty-tests - pass: 4,
- ltp-sched-tests - pass: 13, skip: 1
- ltp-securebits-tests - pass: 4,
- ltp-syscalls-tests - pass: 1053, skip: 97
- ltp-timers-tests - pass: 12, skip: 1
qemu_x86_64 * boot - pass: 21, * kselftest - pass: 68, skip: 14 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1002, skip: 148 * ltp-timers-tests - pass: 12, skip: 1
x86_64 * boot - pass: 20, * kselftest - pass: 61, skip: 19 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 62, skip: 1 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 9, skip: 5 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1031, skip: 119 * ltp-timers-tests - pass: 12, skip: 1
-- Linaro QA (beta) https://qa-reports.linaro.org
On 03/07/2018 11:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.25 release. There are 110 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri Mar 9 19:10:03 UTC 2018. Anything received after that time might be too late.
Build results: total: 145 pass: 145 fail: 0 Qemu test results: total: 141 pass: 141 fail: 0
Details are available at http://kerneltests.org/builders.
Guenter
linux-stable-mirror@lists.linaro.org