This is the start of the stable review cycle for the 4.14.38 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Apr 29 13:57:13 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.38-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.38-rc1
Hans de Goede hdegoede@redhat.com ACPI / video: Only default only_lcd to true on Win8-ready _desktops_
Heiko Carstens heiko.carstens@de.ibm.com s390/uprobes: implement arch_uretprobe_is_alive()
Stefan Haberland sth@linux.vnet.ibm.com s390/dasd: fix IO error for newly defined devices
Sebastian Ott sebott@linux.ibm.com s390/cio: update chpid descriptor after resource accessibility event
Peter Xu peterx@redhat.com tracing: Fix missing tab for hwlat_detector print format
Finn Thain fthain@telegraphics.com.au block/swim: Fix IO error at end of medium
Finn Thain fthain@telegraphics.com.au block/swim: Fix array bounds check
Finn Thain fthain@telegraphics.com.au block/swim: Select appropriate drive on device open
Finn Thain fthain@telegraphics.com.au block/swim: Rename macros to avoid inconsistent inverted logic
Finn Thain fthain@telegraphics.com.au block/swim: Remove extra put_disk() call from error path
Finn Thain fthain@telegraphics.com.au block/swim: Don't log an error message for an invalid ioctl
Finn Thain fthain@telegraphics.com.au block/swim: Check drive type
Finn Thain fthain@telegraphics.com.au m68k/mac: Don't remap SWIM MMIO region
Robert Kolchmeyer rkolchmeyer@google.com fsnotify: Fix fsnotify_mark_connector race
Dan Carpenter dan.carpenter@oracle.com cdrom: information leak in cdrom_ioctl_media_changed()
Martin K. Petersen martin.petersen@oracle.com scsi: mptsas: Disable WRITE SAME
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp commoncap: Handle memory allocation failure.
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "mm/hmm: fix header file if/else/endif maze"
Klaus Goger klaus.goger@theobroma-systems.com arm64: dts: rockchip: remove vdd_log from rk3399-puma
Michal Simek michal.simek@xilinx.com microblaze: Setup dependencies for ASM optimized lib functions
Martin Schwidefsky schwidefsky@de.ibm.com s390: correct module section names for expoline code revert
Martin Schwidefsky schwidefsky@de.ibm.com s390: correct nospec auto detection init order
Martin Schwidefsky schwidefsky@de.ibm.com s390: add sysfs attributes for spectre
Martin Schwidefsky schwidefsky@de.ibm.com s390: report spectre mitigation via syslog
Martin Schwidefsky schwidefsky@de.ibm.com s390: add automatic detection of the spectre defense
Martin Schwidefsky schwidefsky@de.ibm.com s390: move nobp parameter functions to nospec-branch.c
Martin Schwidefsky schwidefsky@de.ibm.com s390/entry.S: fix spurious zeroing of r0
Martin Schwidefsky schwidefsky@de.ibm.com s390: do not bypass BPENTER for interrupt system calls
Martin Schwidefsky schwidefsky@de.ibm.com s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)
Martin Schwidefsky schwidefsky@de.ibm.com KVM: s390: force bp isolation for VSIE
Martin Schwidefsky schwidefsky@de.ibm.com s390: introduce execute-trampolines for branches
Martin Schwidefsky schwidefsky@de.ibm.com s390: run user space and KVM guests with modified branch prediction
Martin Schwidefsky schwidefsky@de.ibm.com s390: add options to change branch prediction behaviour for the kernel
Martin Schwidefsky schwidefsky@de.ibm.com s390/alternative: use a copy of the facility bit mask
Martin Schwidefsky schwidefsky@de.ibm.com s390: add optimized array_index_mask_nospec
Martin Schwidefsky schwidefsky@de.ibm.com s390: scrub registers on kernel entry and KVM exit
Martin Schwidefsky schwidefsky@de.ibm.com KVM: s390: wire up bpb feature
Martin Schwidefsky schwidefsky@de.ibm.com s390: enable CPU alternatives unconditionally
Martin Schwidefsky schwidefsky@de.ibm.com s390: introduce CPU alternatives
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "microblaze: fix endian handling"
Michael S. Tsirkin mst@redhat.com virtio_net: fix adding vids on big-endian
Michael S. Tsirkin mst@redhat.com virtio_net: split out ctrl buffer
Ivan Khoronzhuk ivan.khoronzhuk@linaro.org net: ethernet: ti: cpsw: fix tx vlan priority mapping
Cong Wang xiyou.wangcong@gmail.com llc: fix NULL pointer deref for SOCK_ZAPPED
Cong Wang xiyou.wangcong@gmail.com llc: hold llc_sap before release_sock()
Alexander Aring aring@mojatatu.com net: sched: ife: check on metadata length
Alexander Aring aring@mojatatu.com net: sched: ife: handle malformed tlv length
Soheil Hassas Yeganeh soheil@google.com tcp: clear tp->packets_out when purging write queue
Alexander Aring aring@mojatatu.com net: sched: ife: signal not finding metaid
Doron Roberts-Kedes doronrk@fb.com strparser: Fix incorrect strp->need_bytes value.
Tom Lendacky thomas.lendacky@amd.com amd-xgbe: Only use the SFP supported transceiver signals
Doron Roberts-Kedes doronrk@fb.com strparser: Do not call mod_delayed_work with a timeout of LONG_MAX
Tom Lendacky thomas.lendacky@amd.com amd-xgbe: Improve KR auto-negotiation and training
Xin Long lucien.xin@gmail.com sctp: do not check port in sctp_inet6_cmp_addr
Tom Lendacky thomas.lendacky@amd.com amd-xgbe: Add pre/post auto-negotiation phy hooks
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi
Guillaume Nault g.nault@alphalink.fr pppoe: check sockaddr length in pppoe_connect()
Eric Dumazet edumazet@google.com tipc: add policy for TIPC_NLA_NET_ADDR
Willem de Bruijn willemb@google.com packet: fix bitfield update race
Xin Long lucien.xin@gmail.com team: fix netconsole setup over team
Ursula Braun ubraun@linux.vnet.ibm.com net/smc: fix shutdown in state SMC_LISTEN
Paolo Abeni pabeni@redhat.com team: avoid adding twice the same option to the event list
Wolfgang Bumiller w.bumiller@proxmox.com net: fix deadlock while clearing neighbor proxy table
Eric Dumazet edumazet@google.com tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
Eric Dumazet edumazet@google.com net: af_packet: fix race in PACKET_{R|T}X_RING
Jann Horn jannh@google.com tcp: don't read out-of-bounds opsize
Cong Wang xiyou.wangcong@gmail.com llc: delete timers synchronously in llc_sk_free()
Eric Dumazet edumazet@google.com net: validate attribute sizes in neigh_dump_table()
Guillaume Nault g.nault@alphalink.fr l2tp: check sockaddr length in pppol2tp_connect()
Eric Biggers ebiggers@google.com KEYS: DNS: limit the length of option strings
Ahmed Abdelsalam amsalam20@gmail.com ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts
Eric Dumazet edumazet@google.com ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
Xin Long lucien.xin@gmail.com bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave
Karthikeyan Periyasamy periyasa@codeaurora.org Revert "ath10k: send (re)assoc peer command when NSS changed"
James Bottomley James.Bottomley@HansenPartnership.com tpm: add retry logic
Winkler, Tomas tomas.winkler@intel.com tpm: tpm-interface: fix tpm_transmit/_cmd kdoc
Tomas Winkler tomas.winkler@intel.com tpm: cmd_ready command can be issued only after granting locality
Paweł Jabłoński pawel.jablonski@intel.com i40e: Fix attach VF to VM issue
Neil Armstrong narmstrong@baylibre.com drm: bridge: dw-hdmi: Fix overflow workaround for Amlogic Meson GX SoCs
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "pinctrl: intel: Initialize GPIO properly when used through irqchip"
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 3 + Makefile | 4 +- arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi | 11 - arch/microblaze/Kconfig.platform | 1 + arch/microblaze/Makefile | 17 +- arch/microblaze/lib/fastcopy.S | 4 - arch/s390/Kconfig | 47 ++++ arch/s390/Makefile | 10 + arch/s390/include/asm/alternative.h | 149 ++++++++++++ arch/s390/include/asm/barrier.h | 24 ++ arch/s390/include/asm/facility.h | 18 ++ arch/s390/include/asm/kvm_host.h | 3 +- arch/s390/include/asm/lowcore.h | 7 +- arch/s390/include/asm/nospec-branch.h | 17 ++ arch/s390/include/asm/processor.h | 4 + arch/s390/include/asm/thread_info.h | 4 + arch/s390/include/uapi/asm/kvm.h | 5 +- arch/s390/kernel/Makefile | 6 +- arch/s390/kernel/alternative.c | 112 +++++++++ arch/s390/kernel/early.c | 5 + arch/s390/kernel/entry.S | 250 ++++++++++++++++++--- arch/s390/kernel/ipl.c | 1 + arch/s390/kernel/module.c | 65 +++++- arch/s390/kernel/nospec-branch.c | 169 ++++++++++++++ arch/s390/kernel/processor.c | 18 ++ arch/s390/kernel/setup.c | 14 +- arch/s390/kernel/smp.c | 7 +- arch/s390/kernel/uprobes.c | 9 + arch/s390/kernel/vmlinux.lds.S | 37 +++ arch/s390/kvm/kvm-s390.c | 12 + arch/s390/kvm/vsie.c | 30 +++ drivers/acpi/acpi_video.c | 27 ++- drivers/block/swim.c | 49 ++-- drivers/block/swim3.c | 6 +- drivers/cdrom/cdrom.c | 2 +- drivers/char/tpm/tpm-interface.c | 131 ++++++++--- drivers/char/tpm/tpm.h | 1 + drivers/char/tpm/tpm_crb.c | 108 ++++++--- drivers/char/tpm/tpm_tis_core.c | 4 +- drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 3 + drivers/message/fusion/mptsas.c | 1 + drivers/net/bonding/bond_main.c | 3 +- drivers/net/ethernet/amd/xgbe/xgbe-common.h | 8 + drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c | 16 ++ drivers/net/ethernet/amd/xgbe/xgbe-main.c | 1 + drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 24 +- drivers/net/ethernet/amd/xgbe/xgbe-pci.c | 2 + drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 196 ++++++++++++++-- drivers/net/ethernet/amd/xgbe/xgbe.h | 9 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 11 + drivers/net/ethernet/ti/cpsw.c | 2 +- drivers/net/ppp/pppoe.c | 4 + drivers/net/team/team.c | 38 +++- drivers/net/virtio_net.c | 68 +++--- drivers/net/wireless/ath/ath10k/mac.c | 5 +- drivers/pinctrl/intel/pinctrl-intel.c | 23 +- drivers/s390/block/dasd_alias.c | 13 +- drivers/s390/char/Makefile | 2 + drivers/s390/cio/chsc.c | 14 +- include/linux/fsnotify_backend.h | 4 +- include/linux/hmm.h | 9 +- include/linux/if_vlan.h | 7 +- include/linux/tpm.h | 2 +- include/net/ife.h | 3 +- include/net/llc_conn.h | 1 + include/net/tcp.h | 1 + include/uapi/linux/kvm.h | 1 + kernel/trace/trace_entries.h | 2 +- net/core/dev.c | 2 +- net/core/neighbour.c | 40 ++-- net/dns_resolver/dns_key.c | 13 +- net/ife/ife.c | 38 +++- net/ipv4/tcp.c | 7 +- net/ipv4/tcp_input.c | 7 +- net/ipv6/route.c | 2 + net/ipv6/seg6_iptunnel.c | 2 +- net/l2tp/l2tp_ppp.c | 7 + net/llc/af_llc.c | 14 +- net/llc/llc_c_ac.c | 9 +- net/llc/llc_conn.c | 22 +- net/packet/af_packet.c | 83 ++++--- net/packet/internal.h | 10 +- net/sched/act_ife.c | 9 +- net/sctp/ipv6.c | 60 ++--- net/smc/af_smc.c | 10 +- net/strparser/strparser.c | 9 +- net/tipc/netlink.c | 3 +- security/commoncap.c | 2 + 88 files changed, 1842 insertions(+), 371 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit f5a26acf0162477af6ee4c11b4fb9cffe5d3e257
Mike writes: It seems that commit f5a26acf0162 ("pinctrl: intel: Initialize GPIO properly when used through irqchip") can cause problems on some Skylake systems with Sunrisepoint PCH-H. Namely on certain systems it may turn the backlight PWM pin from native mode to GPIO which makes the screen blank during boot.
There is more information here:
https://bugzilla.redhat.com/show_bug.cgi?id=1543769
The actual reason is that GPIO numbering used in BIOS is using "Windows" numbers meaning that they don't match the hardware 1:1 and because of this a wrong pin (backlight PWM) is picked and switched to GPIO mode.
There is a proper fix for this but since it has quite many dependencies on commits that cannot be considered stable material, I suggest we revert commit f5a26acf0162 from stable trees 4.9, 4.14 and 4.15 to prevent the backlight issue.
Reported-by: Mika Westerberg mika.westerberg@linux.intel.com Fixes: f5a26acf0162 ("pinctrl: intel: Initialize GPIO properly when used through irqchip") Cc: Daniel Drake drake@endlessm.com Cc: Chris Chiu chiu@endlessm.com Cc: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/intel/pinctrl-intel.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-)
--- a/drivers/pinctrl/intel/pinctrl-intel.c +++ b/drivers/pinctrl/intel/pinctrl-intel.c @@ -427,18 +427,6 @@ static void __intel_gpio_set_direction(v writel(value, padcfg0); }
-static void intel_gpio_set_gpio_mode(void __iomem *padcfg0) -{ - u32 value; - - /* Put the pad into GPIO mode */ - value = readl(padcfg0) & ~PADCFG0_PMODE_MASK; - /* Disable SCI/SMI/NMI generation */ - value &= ~(PADCFG0_GPIROUTIOXAPIC | PADCFG0_GPIROUTSCI); - value &= ~(PADCFG0_GPIROUTSMI | PADCFG0_GPIROUTNMI); - writel(value, padcfg0); -} - static int intel_gpio_request_enable(struct pinctrl_dev *pctldev, struct pinctrl_gpio_range *range, unsigned pin) @@ -446,6 +434,7 @@ static int intel_gpio_request_enable(str struct intel_pinctrl *pctrl = pinctrl_dev_get_drvdata(pctldev); void __iomem *padcfg0; unsigned long flags; + u32 value;
raw_spin_lock_irqsave(&pctrl->lock, flags);
@@ -455,7 +444,13 @@ static int intel_gpio_request_enable(str }
padcfg0 = intel_get_padcfg(pctrl, pin, PADCFG0); - intel_gpio_set_gpio_mode(padcfg0); + /* Put the pad into GPIO mode */ + value = readl(padcfg0) & ~PADCFG0_PMODE_MASK; + /* Disable SCI/SMI/NMI generation */ + value &= ~(PADCFG0_GPIROUTIOXAPIC | PADCFG0_GPIROUTSCI); + value &= ~(PADCFG0_GPIROUTSMI | PADCFG0_GPIROUTNMI); + writel(value, padcfg0); + /* Disable TX buffer and enable RX (this will be input) */ __intel_gpio_set_direction(padcfg0, true);
@@ -940,8 +935,6 @@ static int intel_gpio_irq_type(struct ir
raw_spin_lock_irqsave(&pctrl->lock, flags);
- intel_gpio_set_gpio_mode(reg); - value = readl(reg);
value &= ~(PADCFG0_RXEVCFG_MASK | PADCFG0_RXINV);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neil Armstrong narmstrong@baylibre.com
commit 9c305eb442f3b371fc722ade827bbf673514123e upstream.
The Amlogic Meson GX SoCs, embedded the v2.01a controller, has been also identified needing this workaround. This patch adds the corresponding version to enable a single iteration for this specific version.
Fixes: be41fc55f1aa ("drm: bridge: dw-hdmi: Handle overflow workaround based on device version") Acked-by: Archit Taneja architt@codeaurora.org [narmstrong: s/identifies/identified and rebased against Jernej's change] Signed-off-by: Neil Armstrong narmstrong@baylibre.com Link: https://patchwork.freedesktop.org/patch/msgid/1519386277-25902-1-git-send-em... [narmstrong: v4.14 to v4.16 backport] Cc: stable@vger.kernel.org # 4.14.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c @@ -1634,6 +1634,8 @@ static void dw_hdmi_clear_overflow(struc * (and possibly on the platform). So far only i.MX6Q (v1.30a) and * i.MX6DL (v1.31a) have been identified as needing the workaround, with * 4 and 1 iterations respectively. + * The Amlogic Meson GX SoCs (v2.01a) have been identified as needing + * the workaround with a single iteration. */
switch (hdmi->version) { @@ -1641,6 +1643,7 @@ static void dw_hdmi_clear_overflow(struc count = 4; break; case 0x131a: + case 0x201a: count = 1; break; default:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomas Winkler tomas.winkler@intel.com
commit 888d867df4417deffc33927e6fc2c6925736fe92 upstream.
The correct sequence is to first request locality and only after that perform cmd_ready handshake, otherwise the hardware will drop the subsequent message as from the device point of view the cmd_ready handshake wasn't performed. Symmetrically locality has to be relinquished only after going idle handshake has completed, this requires that go_idle has to poll for the completion and as well locality relinquish has to poll for completion so it is not overridden in back to back commands flow.
Two wrapper functions are added (request_locality relinquish_locality) to simplify the error handling.
The issue is only visible on devices that support multiple localities.
Fixes: 877c57d0d0ca ("tpm_crb: request and relinquish locality 0") Signed-off-by: Tomas Winkler tomas.winkler@intel.com Reviewed-by: Jarkko Sakkinen jarkko.sakkine@linux.intel.com Tested-by: Jarkko Sakkinen jarkko.sakkine@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkine@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm-interface.c | 54 ++++++++++++++----- drivers/char/tpm/tpm_crb.c | 108 +++++++++++++++++++++++++++------------ drivers/char/tpm/tpm_tis_core.c | 4 + include/linux/tpm.h | 2 4 files changed, 120 insertions(+), 48 deletions(-)
--- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -369,6 +369,36 @@ err_len: return -EINVAL; }
+static int tpm_request_locality(struct tpm_chip *chip) +{ + int rc; + + if (!chip->ops->request_locality) + return 0; + + rc = chip->ops->request_locality(chip, 0); + if (rc < 0) + return rc; + + chip->locality = rc; + + return 0; +} + +static void tpm_relinquish_locality(struct tpm_chip *chip) +{ + int rc; + + if (!chip->ops->relinquish_locality) + return; + + rc = chip->ops->relinquish_locality(chip, chip->locality); + if (rc) + dev_err(&chip->dev, "%s: : error %d\n", __func__, rc); + + chip->locality = -1; +} + /** * tmp_transmit - Internal kernel interface to transmit TPM commands. * @@ -422,8 +452,6 @@ ssize_t tpm_transmit(struct tpm_chip *ch if (!(flags & TPM_TRANSMIT_UNLOCKED)) mutex_lock(&chip->tpm_mutex);
- if (chip->dev.parent) - pm_runtime_get_sync(chip->dev.parent);
if (chip->ops->clk_enable != NULL) chip->ops->clk_enable(chip, true); @@ -431,14 +459,15 @@ ssize_t tpm_transmit(struct tpm_chip *ch /* Store the decision as chip->locality will be changed. */ need_locality = chip->locality == -1;
- if (!(flags & TPM_TRANSMIT_RAW) && - need_locality && chip->ops->request_locality) { - rc = chip->ops->request_locality(chip, 0); + if (!(flags & TPM_TRANSMIT_RAW) && need_locality) { + rc = tpm_request_locality(chip); if (rc < 0) goto out_no_locality; - chip->locality = rc; }
+ if (chip->dev.parent) + pm_runtime_get_sync(chip->dev.parent); + rc = tpm2_prepare_space(chip, space, ordinal, buf); if (rc) goto out; @@ -499,17 +528,16 @@ out_recv: rc = tpm2_commit_space(chip, space, ordinal, buf, &len);
out: - if (need_locality && chip->ops->relinquish_locality) { - chip->ops->relinquish_locality(chip, chip->locality); - chip->locality = -1; - } + if (chip->dev.parent) + pm_runtime_put_sync(chip->dev.parent); + + if (need_locality) + tpm_relinquish_locality(chip); + out_no_locality: if (chip->ops->clk_enable != NULL) chip->ops->clk_enable(chip, false);
- if (chip->dev.parent) - pm_runtime_put_sync(chip->dev.parent); - if (!(flags & TPM_TRANSMIT_UNLOCKED)) mutex_unlock(&chip->tpm_mutex); return rc ? rc : len; --- a/drivers/char/tpm/tpm_crb.c +++ b/drivers/char/tpm/tpm_crb.c @@ -117,6 +117,25 @@ struct tpm2_crb_smc { u32 smc_func_id; };
+static bool crb_wait_for_reg_32(u32 __iomem *reg, u32 mask, u32 value, + unsigned long timeout) +{ + ktime_t start; + ktime_t stop; + + start = ktime_get(); + stop = ktime_add(start, ms_to_ktime(timeout)); + + do { + if ((ioread32(reg) & mask) == value) + return true; + + usleep_range(50, 100); + } while (ktime_before(ktime_get(), stop)); + + return ((ioread32(reg) & mask) == value); +} + /** * crb_go_idle - request tpm crb device to go the idle state * @@ -132,37 +151,24 @@ struct tpm2_crb_smc { * * Return: 0 always */ -static int __maybe_unused crb_go_idle(struct device *dev, struct crb_priv *priv) +static int crb_go_idle(struct device *dev, struct crb_priv *priv) { if ((priv->flags & CRB_FL_ACPI_START) || (priv->flags & CRB_FL_CRB_SMC_START)) return 0;
iowrite32(CRB_CTRL_REQ_GO_IDLE, &priv->regs_t->ctrl_req); - /* we don't really care when this settles */
+ if (!crb_wait_for_reg_32(&priv->regs_t->ctrl_req, + CRB_CTRL_REQ_GO_IDLE/* mask */, + 0, /* value */ + TPM2_TIMEOUT_C)) { + dev_warn(dev, "goIdle timed out\n"); + return -ETIME; + } return 0; }
-static bool crb_wait_for_reg_32(u32 __iomem *reg, u32 mask, u32 value, - unsigned long timeout) -{ - ktime_t start; - ktime_t stop; - - start = ktime_get(); - stop = ktime_add(start, ms_to_ktime(timeout)); - - do { - if ((ioread32(reg) & mask) == value) - return true; - - usleep_range(50, 100); - } while (ktime_before(ktime_get(), stop)); - - return false; -} - /** * crb_cmd_ready - request tpm crb device to enter ready state * @@ -177,8 +183,7 @@ static bool crb_wait_for_reg_32(u32 __io * * Return: 0 on success -ETIME on timeout; */ -static int __maybe_unused crb_cmd_ready(struct device *dev, - struct crb_priv *priv) +static int crb_cmd_ready(struct device *dev, struct crb_priv *priv) { if ((priv->flags & CRB_FL_ACPI_START) || (priv->flags & CRB_FL_CRB_SMC_START)) @@ -196,11 +201,11 @@ static int __maybe_unused crb_cmd_ready( return 0; }
-static int crb_request_locality(struct tpm_chip *chip, int loc) +static int __crb_request_locality(struct device *dev, + struct crb_priv *priv, int loc) { - struct crb_priv *priv = dev_get_drvdata(&chip->dev); u32 value = CRB_LOC_STATE_LOC_ASSIGNED | - CRB_LOC_STATE_TPM_REG_VALID_STS; + CRB_LOC_STATE_TPM_REG_VALID_STS;
if (!priv->regs_h) return 0; @@ -208,21 +213,45 @@ static int crb_request_locality(struct t iowrite32(CRB_LOC_CTRL_REQUEST_ACCESS, &priv->regs_h->loc_ctrl); if (!crb_wait_for_reg_32(&priv->regs_h->loc_state, value, value, TPM2_TIMEOUT_C)) { - dev_warn(&chip->dev, "TPM_LOC_STATE_x.requestAccess timed out\n"); + dev_warn(dev, "TPM_LOC_STATE_x.requestAccess timed out\n"); return -ETIME; }
return 0; }
-static void crb_relinquish_locality(struct tpm_chip *chip, int loc) +static int crb_request_locality(struct tpm_chip *chip, int loc) { struct crb_priv *priv = dev_get_drvdata(&chip->dev);
+ return __crb_request_locality(&chip->dev, priv, loc); +} + +static int __crb_relinquish_locality(struct device *dev, + struct crb_priv *priv, int loc) +{ + u32 mask = CRB_LOC_STATE_LOC_ASSIGNED | + CRB_LOC_STATE_TPM_REG_VALID_STS; + u32 value = CRB_LOC_STATE_TPM_REG_VALID_STS; + if (!priv->regs_h) - return; + return 0;
iowrite32(CRB_LOC_CTRL_RELINQUISH, &priv->regs_h->loc_ctrl); + if (!crb_wait_for_reg_32(&priv->regs_h->loc_state, mask, value, + TPM2_TIMEOUT_C)) { + dev_warn(dev, "TPM_LOC_STATE_x.requestAccess timed out\n"); + return -ETIME; + } + + return 0; +} + +static int crb_relinquish_locality(struct tpm_chip *chip, int loc) +{ + struct crb_priv *priv = dev_get_drvdata(&chip->dev); + + return __crb_relinquish_locality(&chip->dev, priv, loc); }
static u8 crb_status(struct tpm_chip *chip) @@ -466,6 +495,10 @@ static int crb_map_io(struct acpi_device dev_warn(dev, FW_BUG "Bad ACPI memory layout"); }
+ ret = __crb_request_locality(dev, priv, 0); + if (ret) + return ret; + priv->regs_t = crb_map_res(dev, priv, &io_res, buf->control_address, sizeof(struct crb_regs_tail)); if (IS_ERR(priv->regs_t)) @@ -522,6 +555,8 @@ out:
crb_go_idle(dev, priv);
+ __crb_relinquish_locality(dev, priv, 0); + return ret; }
@@ -589,10 +624,14 @@ static int crb_acpi_add(struct acpi_devi chip->acpi_dev_handle = device->handle; chip->flags = TPM_CHIP_FLAG_TPM2;
- rc = crb_cmd_ready(dev, priv); + rc = __crb_request_locality(dev, priv, 0); if (rc) return rc;
+ rc = crb_cmd_ready(dev, priv); + if (rc) + goto out; + pm_runtime_get_noresume(dev); pm_runtime_set_active(dev); pm_runtime_enable(dev); @@ -602,12 +641,15 @@ static int crb_acpi_add(struct acpi_devi crb_go_idle(dev, priv); pm_runtime_put_noidle(dev); pm_runtime_disable(dev); - return rc; + goto out; }
- pm_runtime_put(dev); + pm_runtime_put_sync(dev);
- return 0; +out: + __crb_relinquish_locality(dev, priv, 0); + + return rc; }
static int crb_acpi_remove(struct acpi_device *device) --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -77,11 +77,13 @@ static bool check_locality(struct tpm_ch return false; }
-static void release_locality(struct tpm_chip *chip, int l) +static int release_locality(struct tpm_chip *chip, int l) { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
tpm_tis_write8(priv, TPM_ACCESS(l), TPM_ACCESS_ACTIVE_LOCALITY); + + return 0; }
static int request_locality(struct tpm_chip *chip, int l) --- a/include/linux/tpm.h +++ b/include/linux/tpm.h @@ -49,7 +49,7 @@ struct tpm_class_ops { bool (*update_timeouts)(struct tpm_chip *chip, unsigned long *timeout_cap); int (*request_locality)(struct tpm_chip *chip, int loc); - void (*relinquish_locality)(struct tpm_chip *chip, int loc); + int (*relinquish_locality)(struct tpm_chip *chip, int loc); void (*clk_enable)(struct tpm_chip *chip, bool value); };
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Winkler, Tomas tomas.winkler@intel.com
commit 65520d46a4adbf7f23bbb6d9b1773513f7bc7821 upstream.
Fix tmp_ -> tpm_ typo and add reference to 'space' parameter in kdoc for tpm_transmit and tpm_transmit_cmd functions.
Signed-off-by: Tomas Winkler tomas.winkler@intel.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm-interface.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -400,9 +400,10 @@ static void tpm_relinquish_locality(stru }
/** - * tmp_transmit - Internal kernel interface to transmit TPM commands. + * tpm_transmit - Internal kernel interface to transmit TPM commands. * * @chip: TPM chip to use + * @space: tpm space * @buf: TPM command buffer * @bufsiz: length of the TPM command buffer * @flags: tpm transmit flags - bitmap @@ -544,10 +545,11 @@ out_no_locality: }
/** - * tmp_transmit_cmd - send a tpm command to the device + * tpm_transmit_cmd - send a tpm command to the device * The function extracts tpm out header return code * * @chip: TPM chip to use + * @space: tpm space * @buf: TPM command buffer * @bufsiz: length of the buffer * @min_rsp_body_length: minimum expected length of response body
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karthikeyan Periyasamy periyasa@codeaurora.org
commit 55cc11da69895a680940c1733caabc37be685f5e upstream.
This reverts commit 55884c045d31a29cf69db8332d1064a1b61dd159.
When Ath10k is in AP mode and an unassociated STA sends a VHT action frame (Operating Mode Notification for the NSS change) periodically to AP this causes ath10k to call ath10k_station_assoc() which sends WMI_PEER_ASSOC_CMDID during NSS update. Over the time (with a certain client it can happen within 15 mins when there are over 500 of these VHT action frames) continuous calls of WMI_PEER_ASSOC_CMDID cause firmware to assert due to resource exhaust.
To my knowledge setting WMI_PEER_NSS peer param itself enough to handle NSS updates and no need to call ath10k_station_assoc(). So revert the original commit from 2014 as it's unclear why the change was really needed. Now the firmware assert doesn't happen anymore.
Issue observed in QCA9984 platform with firmware version:10.4-3.5.3-00053. This Change tested in QCA9984 with firmware version: 10.4-3.5.3-00053 and QCA988x platform with firmware version: 10.2.4-1.0-00036.
Firmware Assert log:
ath10k_pci 0002:01:00.0: firmware crashed! (guid e61f1274-9acd-4c5b-bcca-e032ea6e723c) ath10k_pci 0002:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe ath10k_pci 0002:01:00.0: kconfig debug 1 debugfs 1 tracing 0 dfs 1 testmode 1 ath10k_pci 0002:01:00.0: firmware ver 10.4-3.5.3-00053 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 4c56a386 ath10k_pci 0002:01:00.0: board_file api 2 bmi_id 0:4 crc32 c2271344 ath10k_pci 0002:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal otp max-sta 512 raw 0 hwcrypto 1 ath10k_pci 0002:01:00.0: firmware register dump: ath10k_pci 0002:01:00.0: [00]: 0x0000000A 0x000015B3 0x00981E5F 0x00975B31 ath10k_pci 0002:01:00.0: [04]: 0x00981E5F 0x00060530 0x00000011 0x00446C60 ath10k_pci 0002:01:00.0: [08]: 0x0042F1FC 0x00458080 0x00000017 0x00000000 ath10k_pci 0002:01:00.0: [12]: 0x00000009 0x00000000 0x00973ABC 0x00973AD2 ath10k_pci 0002:01:00.0: [16]: 0x00973AB0 0x00960E62 0x009606CA 0x00000000 ath10k_pci 0002:01:00.0: [20]: 0x40981E5F 0x004066DC 0x00400000 0x00981E34 ath10k_pci 0002:01:00.0: [24]: 0x80983B48 0x0040673C 0x000000C0 0xC0981E5F ath10k_pci 0002:01:00.0: [28]: 0x80993DEB 0x0040676C 0x00431AB8 0x0045D0C4 ath10k_pci 0002:01:00.0: [32]: 0x80993E5C 0x004067AC 0x004303C0 0x0045D0C4 ath10k_pci 0002:01:00.0: [36]: 0x80994AAB 0x004067DC 0x00000000 0x0045D0C4 ath10k_pci 0002:01:00.0: [40]: 0x809971A0 0x0040681C 0x004303C0 0x00441B00 ath10k_pci 0002:01:00.0: [44]: 0x80991904 0x0040688C 0x004303C0 0x0045D0C4 ath10k_pci 0002:01:00.0: [48]: 0x80963AD3 0x00406A7C 0x004303C0 0x009918FC ath10k_pci 0002:01:00.0: [52]: 0x80960E80 0x00406A9C 0x0000001F 0x00400000 ath10k_pci 0002:01:00.0: [56]: 0x80960E51 0x00406ACC 0x00400000 0x00000000 ath10k_pci 0002:01:00.0: Copy Engine register dump: ath10k_pci 0002:01:00.0: index: addr: sr_wr_idx: sr_r_idx: dst_wr_idx: dst_r_idx: ath10k_pci 0002:01:00.0: [00]: 0x0004a000 15 15 3 3 ath10k_pci 0002:01:00.0: [01]: 0x0004a400 17 17 212 213 ath10k_pci 0002:01:00.0: [02]: 0x0004a800 21 21 20 21 ath10k_pci 0002:01:00.0: [03]: 0x0004ac00 25 25 27 25 ath10k_pci 0002:01:00.0: [04]: 0x0004b000 515 515 144 104 ath10k_pci 0002:01:00.0: [05]: 0x0004b400 28 28 155 156 ath10k_pci 0002:01:00.0: [06]: 0x0004b800 12 12 12 12 ath10k_pci 0002:01:00.0: [07]: 0x0004bc00 1 1 1 1 ath10k_pci 0002:01:00.0: [08]: 0x0004c000 0 0 127 0 ath10k_pci 0002:01:00.0: [09]: 0x0004c400 1 1 1 1 ath10k_pci 0002:01:00.0: [10]: 0x0004c800 0 0 0 0 ath10k_pci 0002:01:00.0: [11]: 0x0004cc00 0 0 0 0 ath10k_pci 0002:01:00.0: CE[1] write_index 212 sw_index 213 hw_index 0 nentries_mask 0x000001ff ath10k_pci 0002:01:00.0: CE[2] write_index 20 sw_index 21 hw_index 0 nentries_mask 0x0000007f ath10k_pci 0002:01:00.0: CE[5] write_index 155 sw_index 156 hw_index 0 nentries_mask 0x000001ff ath10k_pci 0002:01:00.0: DMA addr: nbytes: meta data: byte swap: gather: ath10k_pci 0002:01:00.0: [455]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [456]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [457]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [458]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [459]: 0x580c0a42 0 0 0 0 ath10k_pci 0002:01:00.0: [460]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [461]: 0x580c0c42 0 0 0 0 ath10k_pci 0002:01:00.0: [462]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [463]: 0x580c0c42 0 0 0 0 ath10k_pci 0002:01:00.0: [464]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [465]: 0x580c0a42 0 0 0 0 ath10k_pci 0002:01:00.0: [466]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [467]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [468]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [469]: 0x580c1c42 0 0 0 0 ath10k_pci 0002:01:00.0: [470]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [471]: 0x580c1c42 0 0 0 0 ath10k_pci 0002:01:00.0: [472]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [473]: 0x580c1c42 0 0 0 0 ath10k_pci 0002:01:00.0: [474]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [475]: 0x580c0642 0 0 0 0 ath10k_pci 0002:01:00.0: [476]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [477]: 0x580c0842 0 0 0 0 ath10k_pci 0002:01:00.0: [478]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [479]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [480]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [481]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [482]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [483]: 0x580c0842 0 0 0 0 ath10k_pci 0002:01:00.0: [484]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [485]: 0x580c0642 0 0 0 0 ath10k_pci 0002:01:00.0: [486]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [487]: 0x580c0642 0 0 0 0 ath10k_pci 0002:01:00.0: [488]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [489]: 0x580c0842 0 0 0 0 ath10k_pci 0002:01:00.0: [490]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [491]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [492]: 0x58174040 0 1 0 0 ath10k_pci 0002:01:00.0: [493]: 0x5a946040 0 1 0 0 ath10k_pci 0002:01:00.0: [494]: 0x59909040 0 1 0 0 ath10k_pci 0002:01:00.0: [495]: 0x5ae5a040 0 1 0 0 ath10k_pci 0002:01:00.0: [496]: 0x58096040 0 1 0 0 ath10k_pci 0002:01:00.0: [497]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [498]: 0x580c0642 0 0 0 0 ath10k_pci 0002:01:00.0: [499]: 0x5c1e0040 0 1 0 0 ath10k_pci 0002:01:00.0: [500]: 0x58153040 0 1 0 0 ath10k_pci 0002:01:00.0: [501]: 0x58129040 0 1 0 0 ath10k_pci 0002:01:00.0: [502]: 0x5952f040 0 1 0 0 ath10k_pci 0002:01:00.0: [503]: 0x59535040 0 1 0 0 ath10k_pci 0002:01:00.0: [504]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [505]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [506]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [507]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [508]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [509]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [510]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [511]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [512]: 0x5adcc040 0 1 0 0 ath10k_pci 0002:01:00.0: [513]: 0x5cf3d040 0 1 0 0 ath10k_pci 0002:01:00.0: [514]: 0x5c1e9040 64 1 0 0 ath10k_pci 0002:01:00.0: [515]: 0x00000000 0 0 0 0
Signed-off-by: Karthikeyan Periyasamy periyasa@codeaurora.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Cc: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/ath/ath10k/mac.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -5955,9 +5955,8 @@ static void ath10k_sta_rc_update_wk(stru sta->addr, smps, err); }
- if (changed & IEEE80211_RC_SUPP_RATES_CHANGED || - changed & IEEE80211_RC_NSS_CHANGED) { - ath10k_dbg(ar, ATH10K_DBG_MAC, "mac update sta %pM supp rates/nss\n", + if (changed & IEEE80211_RC_SUPP_RATES_CHANGED) { + ath10k_dbg(ar, ATH10K_DBG_MAC, "mac update sta %pM supp rates\n", sta->addr);
err = ath10k_station_assoc(ar, arvif->vif, sta, true);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit ddea788c63094f7c483783265563dd5b50052e28 ]
After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set.
However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo.
One way to reproduce it:
# modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole
The netpoll won't really work.
This patch is to remove that slave_dev npinfo setting in bond_enslave().
Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1656,8 +1656,7 @@ int bond_enslave(struct net_device *bond } /* switch(bond_mode) */
#ifdef CONFIG_NET_POLL_CONTROLLER - slave_dev->npinfo = bond->dev->npinfo; - if (slave_dev->npinfo) { + if (bond->dev->npinfo) { if (slave_enable_netpoll(new_slave)) { netdev_info(bond_dev, "master_dev is using netpoll, but new slave device does not support netpoll\n"); res = -EBUSY;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit aa8f8778493c85fff480cdf8b349b1e1dcb5f243 ]
KMSAN reported use of uninit-value that I tracked to lack of proper size check on RTA_TABLE attribute.
I also believe RTA_PREFSRC lacks a similar check.
Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config") Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/route.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2959,6 +2959,7 @@ void rt6_mtu_change(struct net_device *d
static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = { [RTA_GATEWAY] = { .len = sizeof(struct in6_addr) }, + [RTA_PREFSRC] = { .len = sizeof(struct in6_addr) }, [RTA_OIF] = { .type = NLA_U32 }, [RTA_IIF] = { .type = NLA_U32 }, [RTA_PRIORITY] = { .type = NLA_U32 }, @@ -2970,6 +2971,7 @@ static const struct nla_policy rtm_ipv6_ [RTA_EXPIRES] = { .type = NLA_U32 }, [RTA_UID] = { .type = NLA_U32 }, [RTA_MARK] = { .type = NLA_U32 }, + [RTA_TABLE] = { .type = NLA_U32 }, };
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ahmed Abdelsalam amsalam20@gmail.com
[ Upstream commit a957fa190aa9d9168b33d460a5241a6d088c6265 ]
In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src() in order to set the src addr of outer IPv6 header.
The net_device is required for set_tun_src(). However calling ip6_dst_idev() on dst_entry in case of IPv4 traffic results on the following bug.
Using just dst->dev should fix this BUG.
[ 196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0 [ 196.243329] Oops: 0000 [#1] SMP PTI [ 196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci [ 196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1 [ 196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300 [ 196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202 [ 196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000 [ 196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850 [ 196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800 [ 196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808 [ 196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200 [ 196.246846] FS: 00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000 [ 196.247286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0 [ 196.247804] Call Trace: [ 196.247972] seg6_do_srh+0x15b/0x1c0 [ 196.248156] seg6_output+0x3c/0x220 [ 196.248341] ? prandom_u32+0x14/0x20 [ 196.248526] ? ip_idents_reserve+0x6c/0x80 [ 196.248723] ? __ip_select_ident+0x90/0x100 [ 196.248923] ? ip_append_data.part.50+0x6c/0xd0 [ 196.249133] lwtunnel_output+0x44/0x70 [ 196.249328] ip_send_skb+0x15/0x40 [ 196.249515] raw_sendmsg+0x8c3/0xac0 [ 196.249701] ? _copy_from_user+0x2e/0x60 [ 196.249897] ? rw_copy_check_uvector+0x53/0x110 [ 196.250106] ? _copy_from_user+0x2e/0x60 [ 196.250299] ? copy_msghdr_from_user+0xce/0x140 [ 196.250508] sock_sendmsg+0x36/0x40 [ 196.250690] ___sys_sendmsg+0x292/0x2a0 [ 196.250881] ? _cond_resched+0x15/0x30 [ 196.251074] ? copy_termios+0x1e/0x70 [ 196.251261] ? _copy_to_user+0x22/0x30 [ 196.251575] ? tty_mode_ioctl+0x1c3/0x4e0 [ 196.251782] ? _cond_resched+0x15/0x30 [ 196.251972] ? mutex_lock+0xe/0x30 [ 196.252152] ? vvar_fault+0xd2/0x110 [ 196.252337] ? __do_fault+0x1f/0xc0 [ 196.252521] ? __handle_mm_fault+0xc1f/0x12d0 [ 196.252727] ? __sys_sendmsg+0x63/0xa0 [ 196.252919] __sys_sendmsg+0x63/0xa0 [ 196.253107] do_syscall_64+0x72/0x200 [ 196.253305] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 196.253530] RIP: 0033:0x7fc4480b0690 [ 196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690 [ 196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003 [ 196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002 [ 196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070 [ 196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe [ 196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10 [ 196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60 [ 196.256445] CR2: 0000000000000000 [ 196.256676] ---[ end trace 71af7d093603885c ]---
Fixes: 8936ef7604c11 ("ipv6: sr: fix NULL pointer dereference when setting encap source address") Signed-off-by: Ahmed Abdelsalam amsalam20@gmail.com Acked-by: David Lebrun dlebrun@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/seg6_iptunnel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -136,7 +136,7 @@ int seg6_do_srh_encap(struct sk_buff *sk isrh->nexthdr = proto;
hdr->daddr = isrh->segments[isrh->first_segment]; - set_tun_src(net, ip6_dst_idev(dst)->dev, &hdr->daddr, &hdr->saddr); + set_tun_src(net, dst->dev, &hdr->daddr, &hdr->saddr);
#ifdef CONFIG_IPV6_SEG6_HMAC if (sr_has_hmac(isrh)) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
[ Upstream commit 9c438d7a3a52dcc2b9ed095cb87d3a5e83cf7e60 ]
Adding a dns_resolver key whose payload contains a very long option name resulted in that string being printed in full. This hit the WARN_ONCE() in set_precision() during the printk(), because printk() only supports a precision of up to 32767 bytes:
precision 1000000 too large WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0
Fix it by limiting option strings (combined name + value) to a much more reasonable 128 bytes. The exact limit is arbitrary, but currently the only recognized option is formatted as "dnserror=%lu" which fits well within this limit.
Also ratelimit the printks.
Reproducer:
perl -e 'print "#", "A" x 1000000, "\x00"' | keyctl padd dns_resolver desc @s
This bug was found using syzkaller.
Reported-by: Mark Rutland mark.rutland@arm.com Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dns_resolver/dns_key.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
--- a/net/dns_resolver/dns_key.c +++ b/net/dns_resolver/dns_key.c @@ -25,6 +25,7 @@ #include <linux/moduleparam.h> #include <linux/slab.h> #include <linux/string.h> +#include <linux/ratelimit.h> #include <linux/kernel.h> #include <linux/keyctl.h> #include <linux/err.h> @@ -91,9 +92,9 @@ dns_resolver_preparse(struct key_prepars
next_opt = memchr(opt, '#', end - opt) ?: end; opt_len = next_opt - opt; - if (!opt_len) { - printk(KERN_WARNING - "Empty option to dns_resolver key\n"); + if (opt_len <= 0 || opt_len > 128) { + pr_warn_ratelimited("Invalid option length (%d) for dns_resolver key\n", + opt_len); return -EINVAL; }
@@ -127,10 +128,8 @@ dns_resolver_preparse(struct key_prepars }
bad_option_value: - printk(KERN_WARNING - "Option '%*.*s' to dns_resolver key:" - " bad/missing value\n", - opt_nlen, opt_nlen, opt); + pr_warn_ratelimited("Option '%*.*s' to dns_resolver key: bad/missing value\n", + opt_nlen, opt_nlen, opt); return -EINVAL; } while (opt = next_opt + 1, opt < end); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit eb1c28c05894a4b1f6b56c5bf072205e64cfa280 ]
Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that it actually points to valid data.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_ppp.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -591,6 +591,13 @@ static int pppol2tp_connect(struct socke lock_sock(sk);
error = -EINVAL; + + if (sockaddr_len != sizeof(struct sockaddr_pppol2tp) && + sockaddr_len != sizeof(struct sockaddr_pppol2tpv3) && + sockaddr_len != sizeof(struct sockaddr_pppol2tpin6) && + sockaddr_len != sizeof(struct sockaddr_pppol2tpv3in6)) + goto end; + if (sp->sa_protocol != PX_PROTO_OL2TP) goto end;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7dd07c143a4b54d050e748bee4b4b9e94a7b1744 ]
Since neigh_dump_table() calls nlmsg_parse() without giving policy constraints, attributes can have arbirary size that we must validate
Reported by syzbot/KMSAN :
BUG: KMSAN: uninit-value in neigh_master_filtered net/core/neighbour.c:2292 [inline] BUG: KMSAN: uninit-value in neigh_dump_table net/core/neighbour.c:2348 [inline] BUG: KMSAN: uninit-value in neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438 CPU: 1 PID: 3575 Comm: syzkaller268891 Not tainted 4.16.0+ #83 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 neigh_master_filtered net/core/neighbour.c:2292 [inline] neigh_dump_table net/core/neighbour.c:2348 [inline] neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438 netlink_dump+0x9ad/0x1540 net/netlink/af_netlink.c:2225 __netlink_dump_start+0x1167/0x12a0 net/netlink/af_netlink.c:2322 netlink_dump_start include/linux/netlink.h:214 [inline] rtnetlink_rcv_msg+0x1435/0x1560 net/core/rtnetlink.c:4598 netlink_rcv_skb+0x355/0x5f0 net/netlink/af_netlink.c:2447 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4653 netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline] netlink_unicast+0x1672/0x1750 net/netlink/af_netlink.c:1337 netlink_sendmsg+0x1048/0x1310 net/netlink/af_netlink.c:1900 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmsg net/socket.c:2080 [inline] SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091 SyS_sendmsg+0x54/0x80 net/socket.c:2087 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x43fed9 RSP: 002b:00007ffddbee2798 EFLAGS: 00000213 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fed9 RDX: 0000000000000000 RSI: 0000000020005000 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401800 R13: 0000000000401890 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1183 [inline] netlink_sendmsg+0x9a6/0x1310 net/netlink/af_netlink.c:1875 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmsg net/socket.c:2080 [inline] SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091 SyS_sendmsg+0x54/0x80 net/socket.c:2087 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 21fdd092acc7 ("net: Add support for filtering neigh dump by master device") Signed-off-by: Eric Dumazet edumazet@google.com Cc: David Ahern dsa@cumulusnetworks.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: David Ahern dsa@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2323,12 +2323,16 @@ static int neigh_dump_table(struct neigh
err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL, NULL); if (!err) { - if (tb[NDA_IFINDEX]) + if (tb[NDA_IFINDEX]) { + if (nla_len(tb[NDA_IFINDEX]) != sizeof(u32)) + return -EINVAL; filter_idx = nla_get_u32(tb[NDA_IFINDEX]); - - if (tb[NDA_MASTER]) + } + if (tb[NDA_MASTER]) { + if (nla_len(tb[NDA_MASTER]) != sizeof(u32)) + return -EINVAL; filter_master_idx = nla_get_u32(tb[NDA_MASTER]); - + } if (filter_idx || filter_master_idx) flags |= NLM_F_DUMP_FILTERED; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit b905ef9ab90115d001c1658259af4b1c65088779 ]
The connection timers of an llc sock could be still flying after we delete them in llc_sk_free(), and even possibly after we free the sock. We could just wait synchronously here in case of troubles.
Note, I leave other call paths as they are, since they may not have to wait, at least we can change them to synchronously when needed.
Also, move the code to net/llc/llc_conn.c, which is apparently a better place.
Reported-by: syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/llc_conn.h | 1 + net/llc/llc_c_ac.c | 9 +-------- net/llc/llc_conn.c | 22 +++++++++++++++++++++- 3 files changed, 23 insertions(+), 9 deletions(-)
--- a/include/net/llc_conn.h +++ b/include/net/llc_conn.h @@ -97,6 +97,7 @@ static __inline__ char llc_backlog_type(
struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority, struct proto *prot, int kern); +void llc_sk_stop_all_timers(struct sock *sk, bool sync); void llc_sk_free(struct sock *sk);
void llc_sk_reset(struct sock *sk); --- a/net/llc/llc_c_ac.c +++ b/net/llc/llc_c_ac.c @@ -1096,14 +1096,7 @@ int llc_conn_ac_inc_tx_win_size(struct s
int llc_conn_ac_stop_all_timers(struct sock *sk, struct sk_buff *skb) { - struct llc_sock *llc = llc_sk(sk); - - del_timer(&llc->pf_cycle_timer.timer); - del_timer(&llc->ack_timer.timer); - del_timer(&llc->rej_sent_timer.timer); - del_timer(&llc->busy_state_timer.timer); - llc->ack_must_be_send = 0; - llc->ack_pf = 0; + llc_sk_stop_all_timers(sk, false); return 0; }
--- a/net/llc/llc_conn.c +++ b/net/llc/llc_conn.c @@ -951,6 +951,26 @@ out: return sk; }
+void llc_sk_stop_all_timers(struct sock *sk, bool sync) +{ + struct llc_sock *llc = llc_sk(sk); + + if (sync) { + del_timer_sync(&llc->pf_cycle_timer.timer); + del_timer_sync(&llc->ack_timer.timer); + del_timer_sync(&llc->rej_sent_timer.timer); + del_timer_sync(&llc->busy_state_timer.timer); + } else { + del_timer(&llc->pf_cycle_timer.timer); + del_timer(&llc->ack_timer.timer); + del_timer(&llc->rej_sent_timer.timer); + del_timer(&llc->busy_state_timer.timer); + } + + llc->ack_must_be_send = 0; + llc->ack_pf = 0; +} + /** * llc_sk_free - Frees a LLC socket * @sk - socket to free @@ -963,7 +983,7 @@ void llc_sk_free(struct sock *sk)
llc->state = LLC_CONN_OUT_OF_SVC; /* Stop all (possibly) running timers */ - llc_conn_ac_stop_all_timers(sk, NULL); + llc_sk_stop_all_timers(sk, true); #ifdef DEBUG_LLC_CONN_ALLOC printk(KERN_INFO "%s: unackq=%d, txq=%d\n", __func__, skb_queue_len(&llc->pdu_unack_q),
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
[ Upstream commit 7e5a206ab686f098367b61aca989f5cdfa8114a3 ]
The old code reads the "opsize" variable from out-of-bounds memory (first byte behind the segment) if a broken TCP segment ends directly after an opcode that is neither EOL nor NOP.
The result of the read isn't used for anything, so the worst thing that could theoretically happen is a pagefault; and since the physmap is usually mostly contiguous, even that seems pretty unlikely.
The following C reproducer triggers the uninitialized read - however, you can't actually see anything happen unless you put something like a pr_warn() in tcp_parse_md5sig_option() to print the opsize.
==================================== #define _GNU_SOURCE #include <arpa/inet.h> #include <stdlib.h> #include <errno.h> #include <stdarg.h> #include <net/if.h> #include <linux/if.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/in.h> #include <linux/if_tun.h> #include <err.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <assert.h>
void systemf(const char *command, ...) { char *full_command; va_list ap; va_start(ap, command); if (vasprintf(&full_command, command, ap) == -1) err(1, "vasprintf"); va_end(ap); printf("systemf: <<<%s>>>\n", full_command); system(full_command); }
char *devname;
int tun_alloc(char *name) { int fd = open("/dev/net/tun", O_RDWR); if (fd == -1) err(1, "open tun dev"); static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI }; strcpy(req.ifr_name, name); if (ioctl(fd, TUNSETIFF, &req)) err(1, "TUNSETIFF"); devname = req.ifr_name; printf("device name: %s\n", devname); return fd; }
#define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))
void sum_accumulate(unsigned int *sum, void *data, int len) { assert((len&2)==0); for (int i=0; i<len/2; i++) { *sum += ntohs(((unsigned short *)data)[i]); } }
unsigned short sum_final(unsigned int sum) { sum = (sum >> 16) + (sum & 0xffff); sum = (sum >> 16) + (sum & 0xffff); return htons(~sum); }
void fix_ip_sum(struct iphdr *ip) { unsigned int sum = 0; sum_accumulate(&sum, ip, sizeof(*ip)); ip->check = sum_final(sum); }
void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) { unsigned int sum = 0; struct { unsigned int saddr; unsigned int daddr; unsigned char pad; unsigned char proto_num; unsigned short tcp_len; } fakehdr = { .saddr = ip->saddr, .daddr = ip->daddr, .proto_num = ip->protocol, .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4) }; sum_accumulate(&sum, &fakehdr, sizeof(fakehdr)); sum_accumulate(&sum, tcp, tcp->doff*4); tcp->check = sum_final(sum); }
int main(void) { int tun_fd = tun_alloc("inject_dev%d"); systemf("ip link set %s up", devname); systemf("ip addr add 192.168.42.1/24 dev %s", devname);
struct { struct iphdr ip; struct tcphdr tcp; unsigned char tcp_opts[20]; } __attribute__((packed)) syn_packet = { .ip = { .ihl = sizeof(struct iphdr)/4, .version = 4, .tot_len = htons(sizeof(syn_packet)), .ttl = 30, .protocol = IPPROTO_TCP, /* FIXUP check */ .saddr = IPADDR(192,168,42,2), .daddr = IPADDR(192,168,42,1) }, .tcp = { .source = htons(1), .dest = htons(1337), .seq = 0x12345678, .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4, .syn = 1, .window = htons(64), .check = 0 /*FIXUP*/ }, .tcp_opts = { /* INVALID: trailing MD5SIG opcode after NOPs */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 19 } }; fix_ip_sum(&syn_packet.ip); fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp); while (1) { int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet)); if (write_res != sizeof(syn_packet)) err(1, "packet write failed"); } } ====================================
Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3892,11 +3892,8 @@ const u8 *tcp_parse_md5sig_option(const int length = (th->doff << 2) - sizeof(*th); const u8 *ptr = (const u8 *)(th + 1);
- /* If the TCP option is too short, we can short cut */ - if (length < TCPOLEN_MD5SIG) - return NULL; - - while (length > 0) { + /* If not enough data remaining, we can short cut */ + while (length >= TCPOLEN_MD5SIG) { int opcode = *ptr++; int opsize;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 5171b37d959641bbc619781caf62e61f7b940871 ]
In order to remove the race caught by syzbot [1], we need to lock the socket before using po->tp_version as this could change under us otherwise.
This means lock_sock() and release_sock() must be done by packet_set_ring() callers.
[1] : BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249 CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249 packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662 SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849 SyS_setsockopt+0x76/0xa0 net/socket.c:1828 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x449099 RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099 RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003 RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000 R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001
Local variable description: ----req_u@packet_setsockopt Variable was created at: packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612 SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3017,6 +3017,7 @@ static int packet_release(struct socket
packet_flush_mclist(sk);
+ lock_sock(sk); if (po->rx_ring.pg_vec) { memset(&req_u, 0, sizeof(req_u)); packet_set_ring(sk, &req_u, 1, 0); @@ -3026,6 +3027,7 @@ static int packet_release(struct socket memset(&req_u, 0, sizeof(req_u)); packet_set_ring(sk, &req_u, 1, 1); } + release_sock(sk);
f = fanout_release(sk);
@@ -3654,6 +3656,7 @@ packet_setsockopt(struct socket *sock, i union tpacket_req_u req_u; int len;
+ lock_sock(sk); switch (po->tp_version) { case TPACKET_V1: case TPACKET_V2: @@ -3664,12 +3667,17 @@ packet_setsockopt(struct socket *sock, i len = sizeof(req_u.req3); break; } - if (optlen < len) - return -EINVAL; - if (copy_from_user(&req_u.req, optval, len)) - return -EFAULT; - return packet_set_ring(sk, &req_u, 0, - optname == PACKET_TX_RING); + if (optlen < len) { + ret = -EINVAL; + } else { + if (copy_from_user(&req_u.req, optval, len)) + ret = -EFAULT; + else + ret = packet_set_ring(sk, &req_u, 0, + optname == PACKET_TX_RING); + } + release_sock(sk); + return ret; } case PACKET_COPY_THRESH: { @@ -4219,8 +4227,6 @@ static int packet_set_ring(struct sock * /* Added to avoid minimal code churn */ struct tpacket_req *req = &req_u->req;
- lock_sock(sk); - rb = tx_ring ? &po->tx_ring : &po->rx_ring; rb_queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
@@ -4358,7 +4364,6 @@ static int packet_set_ring(struct sock * if (pg_vec) free_pg_vec(pg_vec, order, req->tp_block_nr); out: - release_sock(sk); return err; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7212303268918b9a203aebeacfdbd83b5e87b20d ]
syzbot/KMSAN reported an uninit-value in tcp_parse_options() [1]
I believe this was caused by a TCP_MD5SIG being set on live flow.
This is highly unexpected, since TCP option space is limited.
For instance, presence of TCP MD5 option automatically disables TCP TimeStamp option at SYN/SYNACK time, which we can not do once flow has been established.
Really, adding/deleting an MD5 key only makes sense on sockets in CLOSE or LISTEN state.
[1] BUG: KMSAN: uninit-value in tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720 CPU: 1 PID: 6177 Comm: syzkaller192004 Not tainted 4.16.0+ #83 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720 tcp_fast_parse_options net/ipv4/tcp_input.c:3858 [inline] tcp_validate_incoming+0x4f1/0x2790 net/ipv4/tcp_input.c:5184 tcp_rcv_established+0xf60/0x2bb0 net/ipv4/tcp_input.c:5453 tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469 sk_backlog_rcv include/net/sock.h:908 [inline] __release_sock+0x2d6/0x680 net/core/sock.c:2271 release_sock+0x97/0x2a0 net/core/sock.c:2786 tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747 SyS_sendto+0x8a/0xb0 net/socket.c:1715 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x448fe9 RSP: 002b:00007fd472c64d38 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000006e5a30 RCX: 0000000000448fe9 RDX: 000000000000029f RSI: 0000000020a88f88 RDI: 0000000000000004 RBP: 00000000006e5a34 R08: 0000000020e68000 R09: 0000000000000010 R10: 00000000200007fd R11: 0000000000000216 R12: 0000000000000000 R13: 00007fff074899ef R14: 00007fd472c659c0 R15: 0000000000000009
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] tcp_send_ack+0x18c/0x910 net/ipv4/tcp_output.c:3624 __tcp_ack_snd_check net/ipv4/tcp_input.c:5040 [inline] tcp_ack_snd_check net/ipv4/tcp_input.c:5053 [inline] tcp_rcv_established+0x2103/0x2bb0 net/ipv4/tcp_input.c:5469 tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469 sk_backlog_rcv include/net/sock.h:908 [inline] __release_sock+0x2d6/0x680 net/core/sock.c:2271 release_sock+0x97/0x2a0 net/core/sock.c:2786 tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747 SyS_sendto+0x8a/0xb0 net/socket.c:1715 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Yuchung Cheng ycheng@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2742,8 +2742,10 @@ static int do_tcp_setsockopt(struct sock #ifdef CONFIG_TCP_MD5SIG case TCP_MD5SIG: case TCP_MD5SIG_EXT: - /* Read the IP->Key mappings from userspace */ - err = tp->af_specific->md5_parse(sk, optname, optval, optlen); + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) + err = tp->af_specific->md5_parse(sk, optname, optval, optlen); + else + err = -EINVAL; break; #endif case TCP_USER_TIMEOUT:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfgang Bumiller w.bumiller@proxmox.com
[ Upstream commit 53b76cdf7e8fecec1d09e38aad2f8579882591a8 ]
When coming from ndisc_netdev_event() in net/ipv6/ndisc.c, neigh_ifdown() is called with &nd_tbl, locking this while clearing the proxy neighbor entries when eg. deleting an interface. Calling the table's pndisc_destructor() with the lock still held, however, can cause a deadlock: When a multicast listener is available an IGMP packet of type ICMPV6_MGM_REDUCTION may be sent out. When reaching ip6_finish_output2(), if no neighbor entry for the target address is found, __neigh_create() is called with &nd_tbl, which it'll want to lock.
Move the elements into their own list, then unlock the table and perform the destruction.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199289 Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().") Signed-off-by: Wolfgang Bumiller w.bumiller@proxmox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -55,7 +55,8 @@ static void neigh_timer_handler(unsigned static void __neigh_notify(struct neighbour *n, int type, int flags, u32 pid); static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid); -static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); +static int pneigh_ifdown_and_unlock(struct neigh_table *tbl, + struct net_device *dev);
#ifdef CONFIG_PROC_FS static const struct file_operations neigh_stat_seq_fops; @@ -291,8 +292,7 @@ int neigh_ifdown(struct neigh_table *tbl { write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev); - pneigh_ifdown(tbl, dev); - write_unlock_bh(&tbl->lock); + pneigh_ifdown_and_unlock(tbl, dev);
del_timer_sync(&tbl->proxy_timer); pneigh_queue_purge(&tbl->proxy_queue); @@ -681,9 +681,10 @@ int pneigh_delete(struct neigh_table *tb return -ENOENT; }
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev) +static int pneigh_ifdown_and_unlock(struct neigh_table *tbl, + struct net_device *dev) { - struct pneigh_entry *n, **np; + struct pneigh_entry *n, **np, *freelist = NULL; u32 h;
for (h = 0; h <= PNEIGH_HASHMASK; h++) { @@ -691,16 +692,23 @@ static int pneigh_ifdown(struct neigh_ta while ((n = *np) != NULL) { if (!dev || n->dev == dev) { *np = n->next; - if (tbl->pdestructor) - tbl->pdestructor(n); - if (n->dev) - dev_put(n->dev); - kfree(n); + n->next = freelist; + freelist = n; continue; } np = &n->next; } } + write_unlock_bh(&tbl->lock); + while ((n = freelist)) { + freelist = n->next; + n->next = NULL; + if (tbl->pdestructor) + tbl->pdestructor(n); + if (n->dev) + dev_put(n->dev); + kfree(n); + } return -ENOENT; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 4fb0534fb7bbc2346ba7d3a072b538007f4135a5 ]
When parsing the options provided by the user space, team_nl_cmd_options_set() insert them in a temporary list to send multiple events with a single message. While each option's attribute is correctly validated, the code does not check for duplicate entries before inserting into the event list.
Exploiting the above, the syzbot was able to trigger the following splat:
kernel BUG at lib/list_debug.c:31! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4466 Comm: syzkaller556835 Not tainted 4.16.0+ #17 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__list_add_valid+0xaa/0xb0 lib/list_debug.c:29 RSP: 0018:ffff8801b04bf248 EFLAGS: 00010286 RAX: 0000000000000058 RBX: ffff8801c8fc7a90 RCX: 0000000000000000 RDX: 0000000000000058 RSI: ffffffff815fbf41 RDI: ffffed0036097e3f RBP: ffff8801b04bf260 R08: ffff8801b0b2a700 R09: ffffed003b604f90 R10: ffffed003b604f90 R11: ffff8801db027c87 R12: ffff8801c8fc7a90 R13: ffff8801c8fc7a90 R14: dffffc0000000000 R15: 0000000000000000 FS: 0000000000b98880(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000043fc30 CR3: 00000001afe8e000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __list_add include/linux/list.h:60 [inline] list_add include/linux/list.h:79 [inline] team_nl_cmd_options_set+0x9ff/0x12b0 drivers/net/team/team.c:2571 genl_family_rcv_msg+0x889/0x1120 net/netlink/genetlink.c:599 genl_rcv_msg+0xc6/0x170 net/netlink/genetlink.c:624 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448 genl_rcv+0x28/0x40 net/netlink/genetlink.c:635 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 ___sys_sendmsg+0x805/0x940 net/socket.c:2117 __sys_sendmsg+0x115/0x270 net/socket.c:2155 SYSC_sendmsg net/socket.c:2164 [inline] SyS_sendmsg+0x29/0x30 net/socket.c:2162 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4458b9 RSP: 002b:00007ffd1d4a7278 EFLAGS: 00000213 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000000001b RCX: 00000000004458b9 RDX: 0000000000000010 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00000000004a74ed R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000213 R12: 00007ffd1d4a7348 R13: 0000000000402a60 R14: 0000000000000000 R15: 0000000000000000 Code: 75 e8 eb a9 48 89 f7 48 89 75 e8 e8 d1 85 7b fe 48 8b 75 e8 eb bb 48 89 f2 48 89 d9 4c 89 e6 48 c7 c7 a0 84 d8 87 e8 ea 67 28 fe <0f> 0b 0f 1f 40 00 48 b8 00 00 00 00 00 fc ff df 55 48 89 e5 41 RIP: __list_add_valid+0xaa/0xb0 lib/list_debug.c:29 RSP: ffff8801b04bf248
This changeset addresses the avoiding list_add() if the current option is already present in the event list.
Reported-and-tested-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/team/team.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -261,6 +261,17 @@ static void __team_option_inst_mark_remo } }
+static bool __team_option_inst_tmp_find(const struct list_head *opts, + const struct team_option_inst *needle) +{ + struct team_option_inst *opt_inst; + + list_for_each_entry(opt_inst, opts, tmp_list) + if (opt_inst == needle) + return true; + return false; +} + static int __team_options_register(struct team *team, const struct team_option *option, size_t option_count) @@ -2561,6 +2572,14 @@ static int team_nl_cmd_options_set(struc if (err) goto team_put; opt_inst->changed = true; + + /* dumb/evil user-space can send us duplicate opt, + * keep only the last one + */ + if (__team_option_inst_tmp_find(&opt_inst_list, + opt_inst)) + continue; + list_add(&opt_inst->tmp_list, &opt_inst_list); } if (!opt_found) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ursula Braun ubraun@linux.vnet.ibm.com
[ Upstream commit 1255fcb2a655f05e02f3a74675a6d6525f187afd ]
Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket crashes, because commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") releases the internal clcsock in smc_close_active() and sets smc->clcsock to NULL. For SHUT_RD the smc_close_active() call is removed. For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the clcsock is already released.
Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun ubraun@linux.vnet.ibm.com Reported-by: Stephen Hemminger stephen@networkplumber.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/smc/af_smc.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
--- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -1203,14 +1203,12 @@ static int smc_shutdown(struct socket *s rc = smc_close_shutdown_write(smc); break; case SHUT_RD: - if (sk->sk_state == SMC_LISTEN) - rc = smc_close_active(smc); - else - rc = 0; - /* nothing more to do because peer is not involved */ + rc = 0; + /* nothing more to do because peer is not involved */ break; } - rc1 = kernel_sock_shutdown(smc->clcsock, how); + if (smc->clcsock) + rc1 = kernel_sock_shutdown(smc->clcsock, how); /* map sock_shutdown_cmd constants to sk_shutdown value range */ sk->sk_shutdown |= how + 1;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit a6361f0ca4b25460f2cdf3235ebe8115f622901e ]
Updates to the bitfields in struct packet_sock are not atomic. Serialize these read-modify-write cycles.
Move po->running into a separate variable. Its writes are protected by po->bind_lock (except for one startup case at packet_create). Also replace a textual precondition warning with lockdep annotation.
All others are set only in packet_setsockopt. Serialize these updates by holding the socket lock. Analogous to other field updates, also hold the lock when testing whether a ring is active (pg_vec).
Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg") Reported-by: DaeRyong Jeong threeearcat@gmail.com Reported-by: Byoungyoung Lee byoungyoung@purdue.edu Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 60 +++++++++++++++++++++++++++++++++++-------------- net/packet/internal.h | 10 ++++---- 2 files changed, 49 insertions(+), 21 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -331,11 +331,11 @@ static void packet_pick_tx_queue(struct skb_set_queue_mapping(skb, queue_index); }
-/* register_prot_hook must be invoked with the po->bind_lock held, +/* __register_prot_hook must be invoked through register_prot_hook * or from a context in which asynchronous accesses to the packet * socket is not possible (packet_create()). */ -static void register_prot_hook(struct sock *sk) +static void __register_prot_hook(struct sock *sk) { struct packet_sock *po = pkt_sk(sk);
@@ -350,8 +350,13 @@ static void register_prot_hook(struct so } }
-/* {,__}unregister_prot_hook() must be invoked with the po->bind_lock - * held. If the sync parameter is true, we will temporarily drop +static void register_prot_hook(struct sock *sk) +{ + lockdep_assert_held_once(&pkt_sk(sk)->bind_lock); + __register_prot_hook(sk); +} + +/* If the sync parameter is true, we will temporarily drop * the po->bind_lock and do a synchronize_net to make sure no * asynchronous packet processing paths still refer to the elements * of po->prot_hook. If the sync parameter is false, it is the @@ -361,6 +366,8 @@ static void __unregister_prot_hook(struc { struct packet_sock *po = pkt_sk(sk);
+ lockdep_assert_held_once(&po->bind_lock); + po->running = 0;
if (po->fanout) @@ -3261,7 +3268,7 @@ static int packet_create(struct net *net
if (proto) { po->prot_hook.type = proto; - register_prot_hook(sk); + __register_prot_hook(sk); }
mutex_lock(&net->packet.sklist_lock); @@ -3743,12 +3750,18 @@ packet_setsockopt(struct socket *sock, i
if (optlen != sizeof(val)) return -EINVAL; - if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) - return -EBUSY; if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT; - po->tp_loss = !!val; - return 0; + + lock_sock(sk); + if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { + ret = -EBUSY; + } else { + po->tp_loss = !!val; + ret = 0; + } + release_sock(sk); + return ret; } case PACKET_AUXDATA: { @@ -3759,7 +3772,9 @@ packet_setsockopt(struct socket *sock, i if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT;
+ lock_sock(sk); po->auxdata = !!val; + release_sock(sk); return 0; } case PACKET_ORIGDEV: @@ -3771,7 +3786,9 @@ packet_setsockopt(struct socket *sock, i if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT;
+ lock_sock(sk); po->origdev = !!val; + release_sock(sk); return 0; } case PACKET_VNET_HDR: @@ -3780,15 +3797,20 @@ packet_setsockopt(struct socket *sock, i
if (sock->type != SOCK_RAW) return -EINVAL; - if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) - return -EBUSY; if (optlen < sizeof(val)) return -EINVAL; if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT;
- po->has_vnet_hdr = !!val; - return 0; + lock_sock(sk); + if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { + ret = -EBUSY; + } else { + po->has_vnet_hdr = !!val; + ret = 0; + } + release_sock(sk); + return ret; } case PACKET_TIMESTAMP: { @@ -3826,11 +3848,17 @@ packet_setsockopt(struct socket *sock, i
if (optlen != sizeof(val)) return -EINVAL; - if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) - return -EBUSY; if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT; - po->tp_tx_has_off = !!val; + + lock_sock(sk); + if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { + ret = -EBUSY; + } else { + po->tp_tx_has_off = !!val; + ret = 0; + } + release_sock(sk); return 0; } case PACKET_QDISC_BYPASS: --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -112,10 +112,12 @@ struct packet_sock { int copy_thresh; spinlock_t bind_lock; struct mutex pg_vec_lock; - unsigned int running:1, /* prot_hook is attached*/ - auxdata:1, + unsigned int running; /* bind_lock must be held */ + unsigned int auxdata:1, /* writer must hold sock lock */ origdev:1, - has_vnet_hdr:1; + has_vnet_hdr:1, + tp_loss:1, + tp_tx_has_off:1; int pressure; int ifindex; /* bound device */ __be16 num; @@ -125,8 +127,6 @@ struct packet_sock { enum tpacket_versions tp_version; unsigned int tp_hdrlen; unsigned int tp_reserve; - unsigned int tp_loss:1; - unsigned int tp_tx_has_off:1; unsigned int tp_tstamp; struct net_device __rcu *cached_dev; int (*xmit)(struct sk_buff *skb);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit ec518f21cb1a1b1f8a516499ea05c60299e04963 ]
Before syzbot/KMSAN bites, add the missing policy for TIPC_NLA_NET_ADDR
Fixes: 27c21416727a ("tipc: add net set to new netlink api") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Jon Maloy jon.maloy@ericsson.com Cc: Ying Xue ying.xue@windriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/tipc/netlink.c +++ b/net/tipc/netlink.c @@ -79,7 +79,8 @@ const struct nla_policy tipc_nl_sock_pol
const struct nla_policy tipc_nl_net_policy[TIPC_NLA_NET_MAX + 1] = { [TIPC_NLA_NET_UNSPEC] = { .type = NLA_UNSPEC }, - [TIPC_NLA_NET_ID] = { .type = NLA_U32 } + [TIPC_NLA_NET_ID] = { .type = NLA_U32 }, + [TIPC_NLA_NET_ADDR] = { .type = NLA_U32 }, };
const struct nla_policy tipc_nl_link_policy[TIPC_NLA_LINK_MAX + 1] = {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit a49e2f5d5fb141884452ddb428f551b123d436b5 ]
We must validate sockaddr_len, otherwise userspace can pass fewer data than we expect and we end up accessing invalid data.
Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers") Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/pppoe.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -620,6 +620,10 @@ static int pppoe_connect(struct socket * lock_sock(sk);
error = -EINVAL; + + if (sockaddr_len != sizeof(struct sockaddr_pppox)) + goto end; + if (sp->sa_protocol != PX_PROTO_OE) goto end;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 7ce2367254e84753bceb07327aaf5c953cfce117 ]
Syzkaller spotted an old bug which leads to reading skb beyond tail by 4 bytes on vlan tagged packets. This is caused because skb_vlan_tagged_multi() did not check skb_headlen.
BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283 [inline] BUG: KMSAN: uninit-value in skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672 [inline] BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline] BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 eth_type_vlan include/linux/if_vlan.h:283 [inline] skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] vlan_features_check include/linux/if_vlan.h:672 [inline] dflt_features_check net/core/dev.c:2949 [inline] netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084 __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x43ffa9 RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9 RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0 R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234 sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085 packet_alloc_skb net/packet/af_packet.c:2803 [inline] packet_snd net/packet/af_packet.c:2894 [inline] packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 58e998c6d239 ("offloading: Force software GSO for multiple vlan tags.") Reported-and-tested-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/if_vlan.h | 7 +++++-- net/core/dev.c | 2 +- 2 files changed, 6 insertions(+), 3 deletions(-)
--- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -584,7 +584,7 @@ static inline bool skb_vlan_tagged(const * Returns true if the skb is tagged with multiple vlan headers, regardless * of whether it is hardware accelerated or not. */ -static inline bool skb_vlan_tagged_multi(const struct sk_buff *skb) +static inline bool skb_vlan_tagged_multi(struct sk_buff *skb) { __be16 protocol = skb->protocol;
@@ -594,6 +594,9 @@ static inline bool skb_vlan_tagged_multi if (likely(!eth_type_vlan(protocol))) return false;
+ if (unlikely(!pskb_may_pull(skb, VLAN_ETH_HLEN))) + return false; + veh = (struct vlan_ethhdr *)skb->data; protocol = veh->h_vlan_encapsulated_proto; } @@ -611,7 +614,7 @@ static inline bool skb_vlan_tagged_multi * * Returns features without unsafe ones if the skb has multiple tags. */ -static inline netdev_features_t vlan_features_check(const struct sk_buff *skb, +static inline netdev_features_t vlan_features_check(struct sk_buff *skb, netdev_features_t features) { if (skb_vlan_tagged_multi(skb)) { --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2903,7 +2903,7 @@ netdev_features_t passthru_features_chec } EXPORT_SYMBOL(passthru_features_check);
-static netdev_features_t dflt_features_check(const struct sk_buff *skb, +static netdev_features_t dflt_features_check(struct sk_buff *skb, struct net_device *dev, netdev_features_t features) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
[ Upstream commit 4d945663a6a0acf3cbe45940503f2eb9584bfee7 ]
Add hooks to the driver auto-negotiation (AN) flow to allow the different phy implementations to perform any steps necessary to improve AN.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 16 ++++++++++++++-- drivers/net/ethernet/amd/xgbe/xgbe.h | 5 +++++ 2 files changed, 19 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c @@ -437,6 +437,9 @@ static void xgbe_an73_disable(struct xgb
static void xgbe_an_restart(struct xgbe_prv_data *pdata) { + if (pdata->phy_if.phy_impl.an_pre) + pdata->phy_if.phy_impl.an_pre(pdata); + switch (pdata->an_mode) { case XGBE_AN_MODE_CL73: case XGBE_AN_MODE_CL73_REDRV: @@ -453,6 +456,9 @@ static void xgbe_an_restart(struct xgbe_
static void xgbe_an_disable(struct xgbe_prv_data *pdata) { + if (pdata->phy_if.phy_impl.an_post) + pdata->phy_if.phy_impl.an_post(pdata); + switch (pdata->an_mode) { case XGBE_AN_MODE_CL73: case XGBE_AN_MODE_CL73_REDRV: @@ -637,11 +643,11 @@ static enum xgbe_an xgbe_an73_incompat_l return XGBE_AN_NO_LINK; }
- xgbe_an73_disable(pdata); + xgbe_an_disable(pdata);
xgbe_switch_mode(pdata);
- xgbe_an73_restart(pdata); + xgbe_an_restart(pdata);
return XGBE_AN_INCOMPAT_LINK; } @@ -820,6 +826,9 @@ static void xgbe_an37_state_machine(stru pdata->an_result = pdata->an_state; pdata->an_state = XGBE_AN_READY;
+ if (pdata->phy_if.phy_impl.an_post) + pdata->phy_if.phy_impl.an_post(pdata); + netif_dbg(pdata, link, pdata->netdev, "CL37 AN result: %s\n", xgbe_state_as_string(pdata->an_result)); } @@ -903,6 +912,9 @@ again: pdata->kx_state = XGBE_RX_BPA; pdata->an_start = 0;
+ if (pdata->phy_if.phy_impl.an_post) + pdata->phy_if.phy_impl.an_post(pdata); + netif_dbg(pdata, link, pdata->netdev, "CL73 AN result: %s\n", xgbe_state_as_string(pdata->an_result)); } --- a/drivers/net/ethernet/amd/xgbe/xgbe.h +++ b/drivers/net/ethernet/amd/xgbe/xgbe.h @@ -833,6 +833,7 @@ struct xgbe_hw_if { /* This structure represents implementation specific routines for an * implementation of a PHY. All routines are required unless noted below. * Optional routines: + * an_pre, an_post * kr_training_pre, kr_training_post */ struct xgbe_phy_impl_if { @@ -875,6 +876,10 @@ struct xgbe_phy_impl_if { /* Process results of auto-negotiation */ enum xgbe_mode (*an_outcome)(struct xgbe_prv_data *);
+ /* Pre/Post auto-negotiation support */ + void (*an_pre)(struct xgbe_prv_data *); + void (*an_post)(struct xgbe_prv_data *); + /* Pre/Post KR training enablement support */ void (*kr_training_pre)(struct xgbe_prv_data *); void (*kr_training_post)(struct xgbe_prv_data *);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 1071ec9d453a38023579714b64a951a2fb982071 ]
pf->cmp_addr() is called before binding a v6 address to the sock. It should not check ports, like in sctp_inet_cmp_addr.
But sctp_inet6_cmp_addr checks the addr by invoking af(6)->cmp_addr, sctp_v6_cmp_addr where it also compares the ports.
This would cause that setsockopt(SCTP_SOCKOPT_BINDX_ADD) could bind multiple duplicated IPv6 addresses after Commit 40b4f0fd74e4 ("sctp: lack the check for ports in sctp_v6_cmp_addr").
This patch is to remove af->cmp_addr called in sctp_inet6_cmp_addr, but do the proper check for both v6 addrs and v4mapped addrs.
v1->v2: - define __sctp_v6_cmp_addr to do the common address comparison used for both pf and af v6 cmp_addr.
Fixes: 40b4f0fd74e4 ("sctp: lack the check for ports in sctp_v6_cmp_addr") Reported-by: Jianwen Ji jiji@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/ipv6.c | 60 ++++++++++++++++++++++++++++---------------------------- 1 file changed, 30 insertions(+), 30 deletions(-)
--- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -521,46 +521,49 @@ static void sctp_v6_to_addr(union sctp_a addr->v6.sin6_scope_id = 0; }
-/* Compare addresses exactly. - * v4-mapped-v6 is also in consideration. - */ -static int sctp_v6_cmp_addr(const union sctp_addr *addr1, - const union sctp_addr *addr2) +static int __sctp_v6_cmp_addr(const union sctp_addr *addr1, + const union sctp_addr *addr2) { if (addr1->sa.sa_family != addr2->sa.sa_family) { if (addr1->sa.sa_family == AF_INET && addr2->sa.sa_family == AF_INET6 && - ipv6_addr_v4mapped(&addr2->v6.sin6_addr)) { - if (addr2->v6.sin6_port == addr1->v4.sin_port && - addr2->v6.sin6_addr.s6_addr32[3] == - addr1->v4.sin_addr.s_addr) - return 1; - } + ipv6_addr_v4mapped(&addr2->v6.sin6_addr) && + addr2->v6.sin6_addr.s6_addr32[3] == + addr1->v4.sin_addr.s_addr) + return 1; + if (addr2->sa.sa_family == AF_INET && addr1->sa.sa_family == AF_INET6 && - ipv6_addr_v4mapped(&addr1->v6.sin6_addr)) { - if (addr1->v6.sin6_port == addr2->v4.sin_port && - addr1->v6.sin6_addr.s6_addr32[3] == - addr2->v4.sin_addr.s_addr) - return 1; - } + ipv6_addr_v4mapped(&addr1->v6.sin6_addr) && + addr1->v6.sin6_addr.s6_addr32[3] == + addr2->v4.sin_addr.s_addr) + return 1; + return 0; } - if (addr1->v6.sin6_port != addr2->v6.sin6_port) - return 0; + if (!ipv6_addr_equal(&addr1->v6.sin6_addr, &addr2->v6.sin6_addr)) return 0; + /* If this is a linklocal address, compare the scope_id. */ - if (ipv6_addr_type(&addr1->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { - if (addr1->v6.sin6_scope_id && addr2->v6.sin6_scope_id && - (addr1->v6.sin6_scope_id != addr2->v6.sin6_scope_id)) { - return 0; - } - } + if ((ipv6_addr_type(&addr1->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) && + addr1->v6.sin6_scope_id && addr2->v6.sin6_scope_id && + addr1->v6.sin6_scope_id != addr2->v6.sin6_scope_id) + return 0;
return 1; }
+/* Compare addresses exactly. + * v4-mapped-v6 is also in consideration. + */ +static int sctp_v6_cmp_addr(const union sctp_addr *addr1, + const union sctp_addr *addr2) +{ + return __sctp_v6_cmp_addr(addr1, addr2) && + addr1->v6.sin6_port == addr2->v6.sin6_port; +} + /* Initialize addr struct to INADDR_ANY. */ static void sctp_v6_inaddr_any(union sctp_addr *addr, __be16 port) { @@ -845,8 +848,8 @@ static int sctp_inet6_cmp_addr(const uni const union sctp_addr *addr2, struct sctp_sock *opt) { - struct sctp_af *af1, *af2; struct sock *sk = sctp_opt2sk(opt); + struct sctp_af *af1, *af2;
af1 = sctp_get_af_specific(addr1->sa.sa_family); af2 = sctp_get_af_specific(addr2->sa.sa_family); @@ -862,10 +865,7 @@ static int sctp_inet6_cmp_addr(const uni if (sctp_is_any(sk, addr1) || sctp_is_any(sk, addr2)) return 1;
- if (addr1->sa.sa_family != addr2->sa.sa_family) - return 0; - - return af1->cmp_addr(addr1, addr2); + return __sctp_v6_cmp_addr(addr1, addr2); }
/* Verify that the provided sockaddr looks bindable. Common verification,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
[ Upstream commit 96f4d430c507ed4856048c2dc9c1a2ea5b5e74e4 ]
Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks to improve the ability to successfully complete Clause 73 AN when running at 10gbps. Hardware can sometimes have issues with CDR lock when the AN DME page exchange is being performed.
The AN and KR training hooks are used as follows: - The pre AN hook is used to disable CDR tracking in the PHY so that the DME page exchange can be successfully and consistently completed. - The post KR training hook is used to re-enable the CDR tracking so that KR training can successfully complete. - The post AN hook is used to check for an unsuccessful AN which will increase a CDR tracking enablement delay (up to a maximum value).
Add two debugfs entries to allow control over use of the CDR tracking workaround. The debugfs entries allow the CDR tracking workaround to be disabled and determine whether to re-enable CDR tracking before or after link training has been initiated.
Also, with these changes the receiver reset cycle that is performed during the link status check can be performed less often.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/amd/xgbe/xgbe-common.h | 8 + drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c | 16 +++ drivers/net/ethernet/amd/xgbe/xgbe-main.c | 1 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 8 + drivers/net/ethernet/amd/xgbe/xgbe-pci.c | 2 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 125 ++++++++++++++++++++++++++- drivers/net/ethernet/amd/xgbe/xgbe.h | 4 7 files changed, 160 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h +++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h @@ -1321,6 +1321,10 @@ #define MDIO_VEND2_AN_STAT 0x8002 #endif
+#ifndef MDIO_VEND2_PMA_CDR_CONTROL +#define MDIO_VEND2_PMA_CDR_CONTROL 0x8056 +#endif + #ifndef MDIO_CTRL1_SPEED1G #define MDIO_CTRL1_SPEED1G (MDIO_CTRL1_SPEED10G & ~BMCR_SPEED100) #endif @@ -1369,6 +1373,10 @@ #define XGBE_AN_CL37_TX_CONFIG_MASK 0x08 #define XGBE_AN_CL37_MII_CTRL_8BIT 0x0100
+#define XGBE_PMA_CDR_TRACK_EN_MASK 0x01 +#define XGBE_PMA_CDR_TRACK_EN_OFF 0x00 +#define XGBE_PMA_CDR_TRACK_EN_ON 0x01 + /* Bit setting and getting macros * The get macro will extract the current bit field value from within * the variable --- a/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-debugfs.c @@ -519,6 +519,22 @@ void xgbe_debugfs_init(struct xgbe_prv_d "debugfs_create_file failed\n"); }
+ if (pdata->vdata->an_cdr_workaround) { + pfile = debugfs_create_bool("an_cdr_workaround", 0600, + pdata->xgbe_debugfs, + &pdata->debugfs_an_cdr_workaround); + if (!pfile) + netdev_err(pdata->netdev, + "debugfs_create_bool failed\n"); + + pfile = debugfs_create_bool("an_cdr_track_early", 0600, + pdata->xgbe_debugfs, + &pdata->debugfs_an_cdr_track_early); + if (!pfile) + netdev_err(pdata->netdev, + "debugfs_create_bool failed\n"); + } + kfree(buf); }
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c @@ -349,6 +349,7 @@ int xgbe_config_netdev(struct xgbe_prv_d XGMAC_SET_BITS(pdata->rss_options, MAC_RSSCR, UDP4TE, 1);
/* Call MDIO/PHY initialization routine */ + pdata->debugfs_an_cdr_workaround = pdata->vdata->an_cdr_workaround; ret = pdata->phy_if.phy_init(pdata); if (ret) return ret; --- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c @@ -432,6 +432,8 @@ static void xgbe_an73_disable(struct xgb xgbe_an73_set(pdata, false, false); xgbe_an73_disable_interrupts(pdata);
+ pdata->an_start = 0; + netif_dbg(pdata, link, pdata->netdev, "CL73 AN disabled\n"); }
@@ -511,11 +513,11 @@ static enum xgbe_an xgbe_an73_tx_trainin XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
- if (pdata->phy_if.phy_impl.kr_training_post) - pdata->phy_if.phy_impl.kr_training_post(pdata); - netif_dbg(pdata, link, pdata->netdev, "KR training initiated\n"); + + if (pdata->phy_if.phy_impl.kr_training_post) + pdata->phy_if.phy_impl.kr_training_post(pdata); }
return XGBE_AN_PAGE_RECEIVED; --- a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c @@ -456,6 +456,7 @@ static const struct xgbe_version_data xg .irq_reissue_support = 1, .tx_desc_prefetch = 5, .rx_desc_prefetch = 5, + .an_cdr_workaround = 1, };
static const struct xgbe_version_data xgbe_v2b = { @@ -470,6 +471,7 @@ static const struct xgbe_version_data xg .irq_reissue_support = 1, .tx_desc_prefetch = 5, .rx_desc_prefetch = 5, + .an_cdr_workaround = 1, };
static const struct pci_device_id xgbe_pci_table[] = { --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c @@ -147,6 +147,14 @@ /* Rate-change complete wait/retry count */ #define XGBE_RATECHANGE_COUNT 500
+/* CDR delay values for KR support (in usec) */ +#define XGBE_CDR_DELAY_INIT 10000 +#define XGBE_CDR_DELAY_INC 10000 +#define XGBE_CDR_DELAY_MAX 100000 + +/* RRC frequency during link status check */ +#define XGBE_RRC_FREQUENCY 10 + enum xgbe_port_mode { XGBE_PORT_MODE_RSVD = 0, XGBE_PORT_MODE_BACKPLANE, @@ -355,6 +363,10 @@ struct xgbe_phy_data { unsigned int redrv_addr; unsigned int redrv_lane; unsigned int redrv_model; + + /* KR AN support */ + unsigned int phy_cdr_notrack; + unsigned int phy_cdr_delay; };
/* I2C, MDIO and GPIO lines are muxed, so only one device at a time */ @@ -2361,7 +2373,7 @@ static int xgbe_phy_link_status(struct x return 1;
/* No link, attempt a receiver reset cycle */ - if (phy_data->rrc_count++) { + if (phy_data->rrc_count++ > XGBE_RRC_FREQUENCY) { phy_data->rrc_count = 0; xgbe_phy_rrc(pdata); } @@ -2669,6 +2681,103 @@ static bool xgbe_phy_port_enabled(struct return true; }
+static void xgbe_phy_cdr_track(struct xgbe_prv_data *pdata) +{ + struct xgbe_phy_data *phy_data = pdata->phy_data; + + if (!pdata->debugfs_an_cdr_workaround) + return; + + if (!phy_data->phy_cdr_notrack) + return; + + usleep_range(phy_data->phy_cdr_delay, + phy_data->phy_cdr_delay + 500); + + XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_VEND2_PMA_CDR_CONTROL, + XGBE_PMA_CDR_TRACK_EN_MASK, + XGBE_PMA_CDR_TRACK_EN_ON); + + phy_data->phy_cdr_notrack = 0; +} + +static void xgbe_phy_cdr_notrack(struct xgbe_prv_data *pdata) +{ + struct xgbe_phy_data *phy_data = pdata->phy_data; + + if (!pdata->debugfs_an_cdr_workaround) + return; + + if (phy_data->phy_cdr_notrack) + return; + + XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_VEND2_PMA_CDR_CONTROL, + XGBE_PMA_CDR_TRACK_EN_MASK, + XGBE_PMA_CDR_TRACK_EN_OFF); + + xgbe_phy_rrc(pdata); + + phy_data->phy_cdr_notrack = 1; +} + +static void xgbe_phy_kr_training_post(struct xgbe_prv_data *pdata) +{ + if (!pdata->debugfs_an_cdr_track_early) + xgbe_phy_cdr_track(pdata); +} + +static void xgbe_phy_kr_training_pre(struct xgbe_prv_data *pdata) +{ + if (pdata->debugfs_an_cdr_track_early) + xgbe_phy_cdr_track(pdata); +} + +static void xgbe_phy_an_post(struct xgbe_prv_data *pdata) +{ + struct xgbe_phy_data *phy_data = pdata->phy_data; + + switch (pdata->an_mode) { + case XGBE_AN_MODE_CL73: + case XGBE_AN_MODE_CL73_REDRV: + if (phy_data->cur_mode != XGBE_MODE_KR) + break; + + xgbe_phy_cdr_track(pdata); + + switch (pdata->an_result) { + case XGBE_AN_READY: + case XGBE_AN_COMPLETE: + break; + default: + if (phy_data->phy_cdr_delay < XGBE_CDR_DELAY_MAX) + phy_data->phy_cdr_delay += XGBE_CDR_DELAY_INC; + else + phy_data->phy_cdr_delay = XGBE_CDR_DELAY_INIT; + break; + } + break; + default: + break; + } +} + +static void xgbe_phy_an_pre(struct xgbe_prv_data *pdata) +{ + struct xgbe_phy_data *phy_data = pdata->phy_data; + + switch (pdata->an_mode) { + case XGBE_AN_MODE_CL73: + case XGBE_AN_MODE_CL73_REDRV: + if (phy_data->cur_mode != XGBE_MODE_KR) + break; + + xgbe_phy_cdr_notrack(pdata); + break; + default: + break; + } +} + static void xgbe_phy_stop(struct xgbe_prv_data *pdata) { struct xgbe_phy_data *phy_data = pdata->phy_data; @@ -2680,6 +2789,9 @@ static void xgbe_phy_stop(struct xgbe_pr xgbe_phy_sfp_reset(phy_data); xgbe_phy_sfp_mod_absent(pdata);
+ /* Reset CDR support */ + xgbe_phy_cdr_track(pdata); + /* Power off the PHY */ xgbe_phy_power_off(pdata);
@@ -2712,6 +2824,9 @@ static int xgbe_phy_start(struct xgbe_pr /* Start in highest supported mode */ xgbe_phy_set_mode(pdata, phy_data->start_mode);
+ /* Reset CDR support */ + xgbe_phy_cdr_track(pdata); + /* After starting the I2C controller, we can check for an SFP */ switch (phy_data->port_mode) { case XGBE_PORT_MODE_SFP: @@ -3019,6 +3134,8 @@ static int xgbe_phy_init(struct xgbe_prv } }
+ phy_data->phy_cdr_delay = XGBE_CDR_DELAY_INIT; + /* Register for driving external PHYs */ mii = devm_mdiobus_alloc(pdata->dev); if (!mii) { @@ -3071,4 +3188,10 @@ void xgbe_init_function_ptrs_phy_v2(stru phy_impl->an_advertising = xgbe_phy_an_advertising;
phy_impl->an_outcome = xgbe_phy_an_outcome; + + phy_impl->an_pre = xgbe_phy_an_pre; + phy_impl->an_post = xgbe_phy_an_post; + + phy_impl->kr_training_pre = xgbe_phy_kr_training_pre; + phy_impl->kr_training_post = xgbe_phy_kr_training_post; } --- a/drivers/net/ethernet/amd/xgbe/xgbe.h +++ b/drivers/net/ethernet/amd/xgbe/xgbe.h @@ -994,6 +994,7 @@ struct xgbe_version_data { unsigned int irq_reissue_support; unsigned int tx_desc_prefetch; unsigned int rx_desc_prefetch; + unsigned int an_cdr_workaround; };
struct xgbe_vxlan_data { @@ -1262,6 +1263,9 @@ struct xgbe_prv_data { unsigned int debugfs_xprop_reg;
unsigned int debugfs_xi2c_reg; + + bool debugfs_an_cdr_workaround; + bool debugfs_an_cdr_track_early; };
/* Function prototypes*/
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doron Roberts-Kedes doronrk@fb.com
[ Upstream commit 7c5aba211dd61f41d737a2c51729eb9fdcd3edf4 ]
struct sock's sk_rcvtimeo is initialized to LONG_MAX/MAX_SCHEDULE_TIMEOUT in sock_init_data. Calling mod_delayed_work with a timeout of LONG_MAX causes spurious execution of the work function. timer->expires is set equal to jiffies + LONG_MAX. When timer_base->clk falls behind the current value of jiffies, the delta between timer_base->clk and jiffies + LONG_MAX causes the expiration to be in the past. Returning early from strp_start_timer if timeo == LONG_MAX solves this problem.
Found while testing net/tls_sw recv path.
Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages") Reviewed-by: Tejun Heo tj@kernel.org Signed-off-by: Doron Roberts-Kedes doronrk@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/strparser/strparser.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -67,7 +67,7 @@ static void strp_abort_strp(struct strpa
static void strp_start_timer(struct strparser *strp, long timeo) { - if (timeo) + if (timeo && timeo != LONG_MAX) mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
[ Upstream commit 117df655f8ed51adb6e6b163812a06ebeae9f453 ]
The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.) that it supports. Update the driver to include checking the eeprom data when deciding whether to use a transceiver signal.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 71 +++++++++++++++++++++------- 1 file changed, 54 insertions(+), 17 deletions(-)
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c @@ -253,6 +253,10 @@ enum xgbe_sfp_speed { #define XGBE_SFP_BASE_VENDOR_SN 4 #define XGBE_SFP_BASE_VENDOR_SN_LEN 16
+#define XGBE_SFP_EXTD_OPT1 1 +#define XGBE_SFP_EXTD_OPT1_RX_LOS BIT(1) +#define XGBE_SFP_EXTD_OPT1_TX_FAULT BIT(3) + #define XGBE_SFP_EXTD_DIAG 28 #define XGBE_SFP_EXTD_DIAG_ADDR_CHANGE BIT(2)
@@ -332,6 +336,7 @@ struct xgbe_phy_data {
unsigned int sfp_gpio_address; unsigned int sfp_gpio_mask; + unsigned int sfp_gpio_inputs; unsigned int sfp_gpio_rx_los; unsigned int sfp_gpio_tx_fault; unsigned int sfp_gpio_mod_absent; @@ -986,6 +991,49 @@ static void xgbe_phy_sfp_external_phy(st phy_data->sfp_phy_avail = 1; }
+static bool xgbe_phy_check_sfp_rx_los(struct xgbe_phy_data *phy_data) +{ + u8 *sfp_extd = phy_data->sfp_eeprom.extd; + + if (!(sfp_extd[XGBE_SFP_EXTD_OPT1] & XGBE_SFP_EXTD_OPT1_RX_LOS)) + return false; + + if (phy_data->sfp_gpio_mask & XGBE_GPIO_NO_RX_LOS) + return false; + + if (phy_data->sfp_gpio_inputs & (1 << phy_data->sfp_gpio_rx_los)) + return true; + + return false; +} + +static bool xgbe_phy_check_sfp_tx_fault(struct xgbe_phy_data *phy_data) +{ + u8 *sfp_extd = phy_data->sfp_eeprom.extd; + + if (!(sfp_extd[XGBE_SFP_EXTD_OPT1] & XGBE_SFP_EXTD_OPT1_TX_FAULT)) + return false; + + if (phy_data->sfp_gpio_mask & XGBE_GPIO_NO_TX_FAULT) + return false; + + if (phy_data->sfp_gpio_inputs & (1 << phy_data->sfp_gpio_tx_fault)) + return true; + + return false; +} + +static bool xgbe_phy_check_sfp_mod_absent(struct xgbe_phy_data *phy_data) +{ + if (phy_data->sfp_gpio_mask & XGBE_GPIO_NO_MOD_ABSENT) + return false; + + if (phy_data->sfp_gpio_inputs & (1 << phy_data->sfp_gpio_mod_absent)) + return true; + + return false; +} + static bool xgbe_phy_belfuse_parse_quirks(struct xgbe_prv_data *pdata) { struct xgbe_phy_data *phy_data = pdata->phy_data; @@ -1031,6 +1079,10 @@ static void xgbe_phy_sfp_parse_eeprom(st if (sfp_base[XGBE_SFP_BASE_EXT_ID] != XGBE_SFP_EXT_ID_SFP) return;
+ /* Update transceiver signals (eeprom extd/options) */ + phy_data->sfp_tx_fault = xgbe_phy_check_sfp_tx_fault(phy_data); + phy_data->sfp_rx_los = xgbe_phy_check_sfp_rx_los(phy_data); + if (xgbe_phy_sfp_parse_quirks(pdata)) return;
@@ -1196,7 +1248,6 @@ put: static void xgbe_phy_sfp_signals(struct xgbe_prv_data *pdata) { struct xgbe_phy_data *phy_data = pdata->phy_data; - unsigned int gpio_input; u8 gpio_reg, gpio_ports[2]; int ret;
@@ -1211,23 +1262,9 @@ static void xgbe_phy_sfp_signals(struct return; }
- gpio_input = (gpio_ports[1] << 8) | gpio_ports[0]; - - if (phy_data->sfp_gpio_mask & XGBE_GPIO_NO_MOD_ABSENT) { - /* No GPIO, just assume the module is present for now */ - phy_data->sfp_mod_absent = 0; - } else { - if (!(gpio_input & (1 << phy_data->sfp_gpio_mod_absent))) - phy_data->sfp_mod_absent = 0; - } - - if (!(phy_data->sfp_gpio_mask & XGBE_GPIO_NO_RX_LOS) && - (gpio_input & (1 << phy_data->sfp_gpio_rx_los))) - phy_data->sfp_rx_los = 1; + phy_data->sfp_gpio_inputs = (gpio_ports[1] << 8) | gpio_ports[0];
- if (!(phy_data->sfp_gpio_mask & XGBE_GPIO_NO_TX_FAULT) && - (gpio_input & (1 << phy_data->sfp_gpio_tx_fault))) - phy_data->sfp_tx_fault = 1; + phy_data->sfp_mod_absent = xgbe_phy_check_sfp_mod_absent(phy_data); }
static void xgbe_phy_sfp_mod_absent(struct xgbe_prv_data *pdata)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doron Roberts-Kedes doronrk@fb.com
[ Upstream commit 9d0c75bf6e03d9bf80c55b0f677dc9b982958fd5 ]
strp_data_ready resets strp->need_bytes to 0 if strp_peek_len indicates that the remainder of the message has been received. However, do_strp_work does not reset strp->need_bytes to 0. If do_strp_work completes a partial message, the value of strp->need_bytes will continue to reflect the needed bytes of the previous message, causing future invocations of strp_data_ready to return early if strp->need_bytes is less than strp_peek_len. Resetting strp->need_bytes to 0 in __strp_recv on handing a full message to the upper layer solves this problem.
__strp_recv also calculates strp->need_bytes using stm->accum_len before stm->accum_len has been incremented by cand_len. This can cause strp->need_bytes to be equal to the full length of the message instead of the full length minus the accumulated length. This, in turn, causes strp_data_ready to return early, even when there is sufficient data to complete the partial message. Incrementing stm->accum_len before using it to calculate strp->need_bytes solves this problem.
Found while testing net/tls_sw recv path.
Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages") Signed-off-by: Doron Roberts-Kedes doronrk@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/strparser/strparser.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -296,9 +296,9 @@ static int __strp_recv(read_descriptor_t strp_start_timer(strp, timeo); }
+ stm->accum_len += cand_len; strp->need_bytes = stm->strp.full_len - stm->accum_len; - stm->accum_len += cand_len; stm->early_eaten = cand_len; STRP_STATS_ADD(strp->stats.bytes, cand_len); desc->count = 0; /* Stop reading socket */ @@ -321,6 +321,7 @@ static int __strp_recv(read_descriptor_t /* Hurray, we have a new message! */ cancel_delayed_work(&strp->msg_timer_work); strp->skb_head = NULL; + strp->need_bytes = 0; STRP_STATS_INCR(strp->stats.msgs);
/* Give skb to upper layer */ @@ -410,9 +411,7 @@ void strp_data_ready(struct strparser *s return;
if (strp->need_bytes) { - if (strp_peek_len(strp) >= strp->need_bytes) - strp->need_bytes = 0; - else + if (strp_peek_len(strp) < strp->need_bytes) return; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Aring aring@mojatatu.com
[ Upstream commit f6cd14537ff9919081be19b9c53b9b19c0d3ea97 ]
We need to record stats for received metadata that we dont know how to process. Have find_decode_metaid() return -ENOENT to capture this.
Signed-off-by: Alexander Aring aring@mojatatu.com Reviewed-by: Yotam Gigi yotam.gi@gmail.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/act_ife.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -605,7 +605,7 @@ static int find_decode_metaid(struct sk_ } }
- return 0; + return -ENOENT; }
static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Soheil Hassas Yeganeh soheil@google.com
Clear tp->packets_out when purging the write queue, otherwise tcp_rearm_rto() mistakenly assumes TCP write queue is not empty. This results in NULL pointer dereference.
Also, remove the redundant `tp->packets_out = 0` from tcp_disconnect(), since tcp_disconnect() calls tcp_write_queue_purge().
Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST) Reported-by: Subash Abhinov Kasiviswanathan subashab@codeaurora.org Reported-by: Sami Farin hvtaifwkbgefbaei@gmail.com Tested-by: Sami Farin hvtaifwkbgefbaei@gmail.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Soheil Hassas Yeganeh soheil@google.com Acked-by: Yuchung Cheng ycheng@google.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/tcp.h | 1 + net/ipv4/tcp.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-)
--- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1616,6 +1616,7 @@ static inline void tcp_write_queue_purge sk_mem_reclaim(sk); tcp_clear_all_retrans_hints(tcp_sk(sk)); tcp_init_send_head(sk); + tcp_sk(sk)->packets_out = 0; }
static inline struct sk_buff *tcp_write_queue_head(const struct sock *sk) --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2354,7 +2354,6 @@ int tcp_disconnect(struct sock *sk, int icsk->icsk_backoff = 0; tp->snd_cwnd = 2; icsk->icsk_probes_out = 0; - tp->packets_out = 0; tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; tp->snd_cwnd_cnt = 0; tp->window_clamp = 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Aring aring@mojatatu.com
[ Upstream commit cc74eddd0ff325d57373cea99f642b787d7f76f5 ]
There is currently no handling to check on a invalid tlv length. This patch adds such handling to avoid killing the kernel with a malformed ife packet.
Signed-off-by: Alexander Aring aring@mojatatu.com Reviewed-by: Yotam Gigi yotam.gi@gmail.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ife.h | 3 ++- net/ife/ife.c | 35 +++++++++++++++++++++++++++++++++-- net/sched/act_ife.c | 7 ++++++- 3 files changed, 41 insertions(+), 4 deletions(-)
--- a/include/net/ife.h +++ b/include/net/ife.h @@ -12,7 +12,8 @@ void *ife_encode(struct sk_buff *skb, u16 metalen); void *ife_decode(struct sk_buff *skb, u16 *metalen);
-void *ife_tlv_meta_decode(void *skbdata, u16 *attrtype, u16 *dlen, u16 *totlen); +void *ife_tlv_meta_decode(void *skbdata, const void *ifehdr_end, u16 *attrtype, + u16 *dlen, u16 *totlen); int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval);
--- a/net/ife/ife.c +++ b/net/ife/ife.c @@ -92,12 +92,43 @@ struct meta_tlvhdr { __be16 len; };
+static bool __ife_tlv_meta_valid(const unsigned char *skbdata, + const unsigned char *ifehdr_end) +{ + const struct meta_tlvhdr *tlv; + u16 tlvlen; + + if (unlikely(skbdata + sizeof(*tlv) > ifehdr_end)) + return false; + + tlv = (const struct meta_tlvhdr *)skbdata; + tlvlen = ntohs(tlv->len); + + /* tlv length field is inc header, check on minimum */ + if (tlvlen < NLA_HDRLEN) + return false; + + /* overflow by NLA_ALIGN check */ + if (NLA_ALIGN(tlvlen) < tlvlen) + return false; + + if (unlikely(skbdata + NLA_ALIGN(tlvlen) > ifehdr_end)) + return false; + + return true; +} + /* Caller takes care of presenting data in network order */ -void *ife_tlv_meta_decode(void *skbdata, u16 *attrtype, u16 *dlen, u16 *totlen) +void *ife_tlv_meta_decode(void *skbdata, const void *ifehdr_end, u16 *attrtype, + u16 *dlen, u16 *totlen) { - struct meta_tlvhdr *tlv = (struct meta_tlvhdr *) skbdata; + struct meta_tlvhdr *tlv; + + if (!__ife_tlv_meta_valid(skbdata, ifehdr_end)) + return NULL;
+ tlv = (struct meta_tlvhdr *)skbdata; *dlen = ntohs(tlv->len) - NLA_HDRLEN; *attrtype = ntohs(tlv->type);
--- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -639,7 +639,12 @@ static int tcf_ife_decode(struct sk_buff u16 mtype; u16 dlen;
- curr_data = ife_tlv_meta_decode(tlv_data, &mtype, &dlen, NULL); + curr_data = ife_tlv_meta_decode(tlv_data, ifehdr_end, &mtype, + &dlen, NULL); + if (!curr_data) { + qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats)); + return TC_ACT_SHOT; + }
if (find_decode_metaid(skb, ife, mtype, dlen, curr_data)) { /* abuse overlimits to count when we receive metadata
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Aring aring@mojatatu.com
[ Upstream commit d57493d6d1be26c8ac8516a4463bfe24956978eb ]
This patch checks if sk buffer is available to dererence ife header. If not then NULL will returned to signal an malformed ife packet. This avoids to crashing the kernel from outside.
Signed-off-by: Alexander Aring aring@mojatatu.com Reviewed-by: Yotam Gigi yotam.gi@gmail.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ife/ife.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/ife/ife.c +++ b/net/ife/ife.c @@ -69,6 +69,9 @@ void *ife_decode(struct sk_buff *skb, u1 int total_pull; u16 ifehdrln;
+ if (!pskb_may_pull(skb, skb->dev->hard_header_len + IFE_METAHDRLEN)) + return NULL; + ifehdr = (struct ifeheadr *) (skb->data + skb->dev->hard_header_len); ifehdrln = ntohs(ifehdr->metalen); total_pull = skb->dev->hard_header_len + ifehdrln;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit f7e43672683b097bb074a8fe7af9bc600a23f231 ]
syzbot reported we still access llc->sap in llc_backlog_rcv() after it is freed in llc_sap_remove_socket():
Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785 llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline] llc_conn_service net/llc/llc_conn.c:400 [inline] llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75 llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x12f/0x3a0 net/core/sock.c:2335 release_sock+0xa4/0x2b0 net/core/sock.c:2850 llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204
llc->sap is refcount'ed and llc_sap_remove_socket() is paired with llc_sap_add_socket(). This can be amended by holding its refcount before llc_sap_remove_socket() and releasing it after release_sock().
Reported-by: syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/llc/af_llc.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -189,6 +189,7 @@ static int llc_ui_release(struct socket { struct sock *sk = sock->sk; struct llc_sock *llc; + struct llc_sap *sap;
if (unlikely(sk == NULL)) goto out; @@ -199,9 +200,15 @@ static int llc_ui_release(struct socket llc->laddr.lsap, llc->daddr.lsap); if (!llc_send_disc(sk)) llc_ui_wait_for_disc(sk, sk->sk_rcvtimeo); + sap = llc->sap; + /* Hold this for release_sock(), so that llc_backlog_rcv() could still + * use it. + */ + llc_sap_hold(sap); if (!sock_flag(sk, SOCK_ZAPPED)) llc_sap_remove_socket(llc->sap, sk); release_sock(sk); + llc_sap_put(sap); if (llc->dev) dev_put(llc->dev); sock_put(sk);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 3a04ce7130a7e5dad4e78d45d50313747f8c830f ]
For SOCK_ZAPPED socket, we don't need to care about llc->sap, so we should just skip these refcount functions in this case.
Fixes: f7e43672683b ("llc: hold llc_sap before release_sock()") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/llc/af_llc.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
--- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -189,7 +189,6 @@ static int llc_ui_release(struct socket { struct sock *sk = sock->sk; struct llc_sock *llc; - struct llc_sap *sap;
if (unlikely(sk == NULL)) goto out; @@ -200,15 +199,19 @@ static int llc_ui_release(struct socket llc->laddr.lsap, llc->daddr.lsap); if (!llc_send_disc(sk)) llc_ui_wait_for_disc(sk, sk->sk_rcvtimeo); - sap = llc->sap; - /* Hold this for release_sock(), so that llc_backlog_rcv() could still - * use it. - */ - llc_sap_hold(sap); - if (!sock_flag(sk, SOCK_ZAPPED)) + if (!sock_flag(sk, SOCK_ZAPPED)) { + struct llc_sap *sap = llc->sap; + + /* Hold this for release_sock(), so that llc_backlog_rcv() + * could still use it. + */ + llc_sap_hold(sap); llc_sap_remove_socket(llc->sap, sk); - release_sock(sk); - llc_sap_put(sap); + release_sock(sk); + llc_sap_put(sap); + } else { + release_sock(sk); + } if (llc->dev) dev_put(llc->dev); sock_put(sk);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Khoronzhuk ivan.khoronzhuk@linaro.org
[ Upstream commit 5e391dc5a8d801a2410d0032ad4a428d1d61800c ]
The CPDMA_TX_PRIORITY_MAP in real is vlan pcp field priority mapping register and basically replaces vlan pcp field for tagged packets. So, set it to be 1:1 mapping. Otherwise, it will cause unexpected change of egress vlan tagged packets, like prio 2 -> prio 5.
Fixes: e05107e6b747 ("net: ethernet: ti: cpsw: add multi queue support") Reviewed-by: Grygorii Strashko grygorii.strashko@ti.com Signed-off-by: Ivan Khoronzhuk ivan.khoronzhuk@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ti/cpsw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -124,7 +124,7 @@ do { \
#define RX_PRIORITY_MAPPING 0x76543210 #define TX_PRIORITY_MAPPING 0x33221100 -#define CPDMA_TX_PRIORITY_MAP 0x01234567 +#define CPDMA_TX_PRIORITY_MAP 0x76543210
#define CPSW_VLAN_AWARE BIT(1) #define CPSW_ALE_VLAN_AWARE 1
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Michael S. Tsirkin" mst@redhat.com
[ Upstream commit 12e571693837d6164bda61e316b1944972ee0d97 ]
When sending control commands, virtio net sets up several buffers for DMA. The buffers are all part of the net device which means it's actually allocated by kvmalloc so it's in theory (on extreme memory pressure) possible to get a vmalloc'ed buffer which on some platforms means we can't DMA there.
Fix up by moving the DMA buffers into a separate structure.
Reported-by: Mikulas Patocka mpatocka@redhat.com Suggested-by: Eric Dumazet eric.dumazet@gmail.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 68 ++++++++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 29 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -116,6 +116,17 @@ struct receive_queue { char name[40]; };
+/* Control VQ buffers: protected by the rtnl lock */ +struct control_buf { + struct virtio_net_ctrl_hdr hdr; + virtio_net_ctrl_ack status; + struct virtio_net_ctrl_mq mq; + u8 promisc; + u8 allmulti; + u16 vid; + u64 offloads; +}; + struct virtnet_info { struct virtio_device *vdev; struct virtqueue *cvq; @@ -164,14 +175,7 @@ struct virtnet_info { struct hlist_node node; struct hlist_node node_dead;
- /* Control VQ buffers: protected by the rtnl lock */ - struct virtio_net_ctrl_hdr ctrl_hdr; - virtio_net_ctrl_ack ctrl_status; - struct virtio_net_ctrl_mq ctrl_mq; - u8 ctrl_promisc; - u8 ctrl_allmulti; - u16 ctrl_vid; - u64 ctrl_offloads; + struct control_buf *ctrl;
/* Ethtool settings */ u8 duplex; @@ -1340,25 +1344,25 @@ static bool virtnet_send_command(struct /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
- vi->ctrl_status = ~0; - vi->ctrl_hdr.class = class; - vi->ctrl_hdr.cmd = cmd; + vi->ctrl->status = ~0; + vi->ctrl->hdr.class = class; + vi->ctrl->hdr.cmd = cmd; /* Add header */ - sg_init_one(&hdr, &vi->ctrl_hdr, sizeof(vi->ctrl_hdr)); + sg_init_one(&hdr, &vi->ctrl->hdr, sizeof(vi->ctrl->hdr)); sgs[out_num++] = &hdr;
if (out) sgs[out_num++] = out;
/* Add return status. */ - sg_init_one(&stat, &vi->ctrl_status, sizeof(vi->ctrl_status)); + sg_init_one(&stat, &vi->ctrl->status, sizeof(vi->ctrl->status)); sgs[out_num] = &stat;
BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
if (unlikely(!virtqueue_kick(vi->cvq))) - return vi->ctrl_status == VIRTIO_NET_OK; + return vi->ctrl->status == VIRTIO_NET_OK;
/* Spin for a response, the kick causes an ioport write, trapping * into the hypervisor, so the request should be handled immediately. @@ -1367,7 +1371,7 @@ static bool virtnet_send_command(struct !virtqueue_is_broken(vi->cvq)) cpu_relax();
- return vi->ctrl_status == VIRTIO_NET_OK; + return vi->ctrl->status == VIRTIO_NET_OK; }
static int virtnet_set_mac_address(struct net_device *dev, void *p) @@ -1478,8 +1482,8 @@ static int _virtnet_set_queues(struct vi if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ)) return 0;
- vi->ctrl_mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs); - sg_init_one(&sg, &vi->ctrl_mq, sizeof(vi->ctrl_mq)); + vi->ctrl->mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs); + sg_init_one(&sg, &vi->ctrl->mq, sizeof(vi->ctrl->mq));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &sg)) { @@ -1537,22 +1541,22 @@ static void virtnet_set_rx_mode(struct n if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX)) return;
- vi->ctrl_promisc = ((dev->flags & IFF_PROMISC) != 0); - vi->ctrl_allmulti = ((dev->flags & IFF_ALLMULTI) != 0); + vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0); + vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
- sg_init_one(sg, &vi->ctrl_promisc, sizeof(vi->ctrl_promisc)); + sg_init_one(sg, &vi->ctrl->promisc, sizeof(vi->ctrl->promisc));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, VIRTIO_NET_CTRL_RX_PROMISC, sg)) dev_warn(&dev->dev, "Failed to %sable promisc mode.\n", - vi->ctrl_promisc ? "en" : "dis"); + vi->ctrl->promisc ? "en" : "dis");
- sg_init_one(sg, &vi->ctrl_allmulti, sizeof(vi->ctrl_allmulti)); + sg_init_one(sg, &vi->ctrl->allmulti, sizeof(vi->ctrl->allmulti));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, VIRTIO_NET_CTRL_RX_ALLMULTI, sg)) dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n", - vi->ctrl_allmulti ? "en" : "dis"); + vi->ctrl->allmulti ? "en" : "dis");
uc_count = netdev_uc_count(dev); mc_count = netdev_mc_count(dev); @@ -1598,8 +1602,8 @@ static int virtnet_vlan_rx_add_vid(struc struct virtnet_info *vi = netdev_priv(dev); struct scatterlist sg;
- vi->ctrl_vid = vid; - sg_init_one(&sg, &vi->ctrl_vid, sizeof(vi->ctrl_vid)); + vi->ctrl->vid = vid; + sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN, VIRTIO_NET_CTRL_VLAN_ADD, &sg)) @@ -1613,8 +1617,8 @@ static int virtnet_vlan_rx_kill_vid(stru struct virtnet_info *vi = netdev_priv(dev); struct scatterlist sg;
- vi->ctrl_vid = vid; - sg_init_one(&sg, &vi->ctrl_vid, sizeof(vi->ctrl_vid)); + vi->ctrl->vid = vid; + sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN, VIRTIO_NET_CTRL_VLAN_DEL, &sg)) @@ -1912,9 +1916,9 @@ static int virtnet_restore_up(struct vir static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads) { struct scatterlist sg; - vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads); + vi->ctrl->offloads = cpu_to_virtio64(vi->vdev, offloads);
- sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads)); + sg_init_one(&sg, &vi->ctrl->offloads, sizeof(vi->ctrl->offloads));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS, VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) { @@ -2134,6 +2138,7 @@ static void virtnet_free_queues(struct v
kfree(vi->rq); kfree(vi->sq); + kfree(vi->ctrl); }
static void _free_receive_bufs(struct virtnet_info *vi) @@ -2326,6 +2331,9 @@ static int virtnet_alloc_queues(struct v { int i;
+ vi->ctrl = kzalloc(sizeof(*vi->ctrl), GFP_KERNEL); + if (!vi->ctrl) + goto err_ctrl; vi->sq = kzalloc(sizeof(*vi->sq) * vi->max_queue_pairs, GFP_KERNEL); if (!vi->sq) goto err_sq; @@ -2351,6 +2359,8 @@ static int virtnet_alloc_queues(struct v err_rq: kfree(vi->sq); err_sq: + kfree(vi->ctrl); +err_ctrl: return -ENOMEM; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Michael S. Tsirkin" mst@redhat.com
[ Upstream commit d7fad4c840f33a6bd333dd7fbb3006edbcf0017a ]
Programming vids (adding or removing them) still passes guest-endian values in the DMA buffer. That's wrong if guest is big-endian and when virtio 1 is enabled.
Note: this is on top of a previous patch: virtio_net: split out ctrl buffer
Fixes: 9465a7a6f ("virtio_net: enable v1.0 support") Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -123,7 +123,7 @@ struct control_buf { struct virtio_net_ctrl_mq mq; u8 promisc; u8 allmulti; - u16 vid; + __virtio16 vid; u64 offloads; };
@@ -1602,7 +1602,7 @@ static int virtnet_vlan_rx_add_vid(struc struct virtnet_info *vi = netdev_priv(dev); struct scatterlist sg;
- vi->ctrl->vid = vid; + vi->ctrl->vid = cpu_to_virtio16(vi->vdev, vid); sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN, @@ -1617,7 +1617,7 @@ static int virtnet_vlan_rx_kill_vid(stru struct virtnet_info *vi = netdev_priv(dev); struct scatterlist sg;
- vi->ctrl->vid = vid; + vi->ctrl->vid = cpu_to_virtio16(vi->vdev, vid); sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit ac3d021048be9edb825f0794da5b42f04fefecef which is commit 71e7673dadfdae0605d4c1f66ecb4b045c79fe0f upstream.
kbuild reports that this causes build regressions in 4.14.y, so drop it.
Reported-by: kbuild test robot lkp@intel.com Reported-by: "Hao, Shun" shun.hao@intel.com Cc: Arnd Bergmann arnd@arndb.de Cc: Michal Simek michal.simek@xilinx.com Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/microblaze/Makefile | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)
--- a/arch/microblaze/Makefile +++ b/arch/microblaze/Makefile @@ -36,21 +36,16 @@ endif CPUFLAGS-$(CONFIG_XILINX_MICROBLAZE0_USE_DIV) += -mno-xl-soft-div CPUFLAGS-$(CONFIG_XILINX_MICROBLAZE0_USE_BARREL) += -mxl-barrel-shift CPUFLAGS-$(CONFIG_XILINX_MICROBLAZE0_USE_PCMP_INSTR) += -mxl-pattern-compare - -ifdef CONFIG_CPU_BIG_ENDIAN -KBUILD_CFLAGS += -mbig-endian -KBUILD_AFLAGS += -mbig-endian -LD += -EB -else -KBUILD_CFLAGS += -mlittle-endian -KBUILD_AFLAGS += -mlittle-endian -LD += -EL -endif +CPUFLAGS-$(CONFIG_BIG_ENDIAN) += -mbig-endian +CPUFLAGS-$(CONFIG_LITTLE_ENDIAN) += -mlittle-endian
CPUFLAGS-1 += $(call cc-option,-mcpu=v$(CPU_VER))
# r31 holds current when in kernel mode -KBUILD_CFLAGS += -ffixed-r31 $(CPUFLAGS-y) $(CPUFLAGS-1) $(CPUFLAGS-2) +KBUILD_CFLAGS += -ffixed-r31 $(CPUFLAGS-1) $(CPUFLAGS-2) + +LDFLAGS := +LDFLAGS_vmlinux :=
head-y := arch/microblaze/kernel/head.o libs-y += arch/microblaze/lib/
On Fri, Apr 27, 2018 at 03:58:34PM +0200, Greg Kroah-Hartman wrote:
I don't know about kbuild, but with this revert applied all Microblaze builds fail. There are endless
warning: #warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN [-Wcpp]
followed by
lib/find_bit.c:185:15: error: redefinition of ‘find_next_bit_le’
I have to revert this revert to get successful builds.
Guenter
On Fri, Apr 27, 2018 at 09:25:24AM -0700, Guenter Roeck wrote:
Sorry about that, yes, you are right, it needs to be dropped, and I've now done so and pushed out a -rc2 with that fixed up.
thanks,
greg k-h
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Vasily Gorbik gor@linux.vnet.ibm.com
[ Upstream commit 686140a1a9c41d85a4212a1c26d671139b76404b ]
Implement CPU alternatives, which allows to optionally patch newer instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although ideal instructions padding (when altinstr is longer then oldinstr) is added at compile time, and no oldinstr nops optimization has to be done at runtime. Also couple of compile time sanity checks are done: 1. oldinstr and altinstr must be <= 254 bytes long, 2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility); alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of __init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik gor@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/kernel-parameters.txt | 3 arch/s390/Kconfig | 17 ++ arch/s390/include/asm/alternative.h | 163 ++++++++++++++++++++++++ arch/s390/kernel/Makefile | 1 arch/s390/kernel/alternative.c | 110 ++++++++++++++++ arch/s390/kernel/module.c | 17 ++ arch/s390/kernel/setup.c | 3 arch/s390/kernel/vmlinux.lds.S | 23 +++ 8 files changed, 337 insertions(+) create mode 100644 arch/s390/include/asm/alternative.h create mode 100644 arch/s390/kernel/alternative.c
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2541,6 +2541,9 @@
noalign [KNL,ARM]
+ noaltinstr [S390] Disables alternative instructions patching + (CPU alternatives feature). + noapic [SMP,APIC] Tells the kernel to not make use of any IOAPICs that may be present in the system.
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -538,6 +538,22 @@ config ARCH_RANDOM
If unsure, say Y.
+config ALTERNATIVES + def_bool y + prompt "Patch optimized instructions for running CPU type" + help + When enabled the kernel code is compiled with additional + alternative instructions blocks optimized for newer CPU types. + These alternative instructions blocks are patched at kernel boot + time when running CPU supports them. This mechanism is used to + optimize some critical code paths (i.e. spinlocks) for newer CPUs + even if kernel is build to support older machine generations. + + This mechanism could be disabled by appending "noaltinstr" + option to the kernel command line. + + If unsure, say Y. + endmenu
menu "Memory setup" @@ -812,6 +828,7 @@ config PFAULT config SHARED_KERNEL bool "VM shared kernel support" depends on !JUMP_LABEL + depends on !ALTERNATIVES help Select this option, if you want to share the text segment of the Linux kernel between different VM guests. This reduces memory --- /dev/null +++ b/arch/s390/include/asm/alternative.h @@ -0,0 +1,163 @@ +#ifndef _ASM_S390_ALTERNATIVE_H +#define _ASM_S390_ALTERNATIVE_H + +#ifndef __ASSEMBLY__ + +#include <linux/types.h> +#include <linux/stddef.h> +#include <linux/stringify.h> + +struct alt_instr { + s32 instr_offset; /* original instruction */ + s32 repl_offset; /* offset to replacement instruction */ + u16 facility; /* facility bit set for replacement */ + u8 instrlen; /* length of original instruction */ + u8 replacementlen; /* length of new instruction */ +} __packed; + +#ifdef CONFIG_ALTERNATIVES +extern void apply_alternative_instructions(void); +extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); +#else +static inline void apply_alternative_instructions(void) {}; +static inline void apply_alternatives(struct alt_instr *start, + struct alt_instr *end) {}; +#endif +/* + * |661: |662: |6620 |663: + * +-----------+---------------------+ + * | oldinstr | oldinstr_padding | + * | +----------+----------+ + * | | | | + * | | >6 bytes |6/4/2 nops| + * | |6 bytes jg-----------> + * +-----------+---------------------+ + * ^^ static padding ^^ + * + * .altinstr_replacement section + * +---------------------+-----------+ + * |6641: |6651: + * | alternative instr 1 | + * +-----------+---------+- - - - - -+ + * |6642: |6652: | + * | alternative instr 2 | padding + * +---------------------+- - - - - -+ + * ^ runtime ^ + * + * .altinstructions section + * +---------------------------------+ + * | alt_instr entries for each | + * | alternative instr | + * +---------------------------------+ + */ + +#define b_altinstr(num) "664"#num +#define e_altinstr(num) "665"#num + +#define e_oldinstr_pad_end "663" +#define oldinstr_len "662b-661b" +#define oldinstr_total_len e_oldinstr_pad_end"b-661b" +#define altinstr_len(num) e_altinstr(num)"b-"b_altinstr(num)"b" +#define oldinstr_pad_len(num) \ + "-(((" altinstr_len(num) ")-(" oldinstr_len ")) > 0) * " \ + "((" altinstr_len(num) ")-(" oldinstr_len "))" + +#define INSTR_LEN_SANITY_CHECK(len) \ + ".if " len " > 254\n" \ + "\t.error "cpu alternatives does not support instructions " \ + "blocks > 254 bytes"\n" \ + ".endif\n" \ + ".if (" len ") %% 2\n" \ + "\t.error "cpu alternatives instructions length is odd"\n" \ + ".endif\n" + +#define OLDINSTR_PADDING(oldinstr, num) \ + ".if " oldinstr_pad_len(num) " > 6\n" \ + "\tjg " e_oldinstr_pad_end "f\n" \ + "6620:\n" \ + "\t.fill (" oldinstr_pad_len(num) " - (6620b-662b)) / 2, 2, 0x0700\n" \ + ".else\n" \ + "\t.fill " oldinstr_pad_len(num) " / 6, 6, 0xc0040000\n" \ + "\t.fill " oldinstr_pad_len(num) " %% 6 / 4, 4, 0x47000000\n" \ + "\t.fill " oldinstr_pad_len(num) " %% 6 %% 4 / 2, 2, 0x0700\n" \ + ".endif\n" + +#define OLDINSTR(oldinstr, num) \ + "661:\n\t" oldinstr "\n662:\n" \ + OLDINSTR_PADDING(oldinstr, num) \ + e_oldinstr_pad_end ":\n" \ + INSTR_LEN_SANITY_CHECK(oldinstr_len) + +#define OLDINSTR_2(oldinstr, num1, num2) \ + "661:\n\t" oldinstr "\n662:\n" \ + ".if " altinstr_len(num1) " < " altinstr_len(num2) "\n" \ + OLDINSTR_PADDING(oldinstr, num2) \ + ".else\n" \ + OLDINSTR_PADDING(oldinstr, num1) \ + ".endif\n" \ + e_oldinstr_pad_end ":\n" \ + INSTR_LEN_SANITY_CHECK(oldinstr_len) + +#define ALTINSTR_ENTRY(facility, num) \ + "\t.long 661b - .\n" /* old instruction */ \ + "\t.long " b_altinstr(num)"b - .\n" /* alt instruction */ \ + "\t.word " __stringify(facility) "\n" /* facility bit */ \ + "\t.byte " oldinstr_total_len "\n" /* source len */ \ + "\t.byte " altinstr_len(num) "\n" /* alt instruction len */ + +#define ALTINSTR_REPLACEMENT(altinstr, num) /* replacement */ \ + b_altinstr(num)":\n\t" altinstr "\n" e_altinstr(num) ":\n" \ + INSTR_LEN_SANITY_CHECK(altinstr_len(num)) + +#ifdef CONFIG_ALTERNATIVES +/* alternative assembly primitive: */ +#define ALTERNATIVE(oldinstr, altinstr, facility) \ + ".pushsection .altinstr_replacement, "ax"\n" \ + ALTINSTR_REPLACEMENT(altinstr, 1) \ + ".popsection\n" \ + OLDINSTR(oldinstr, 1) \ + ".pushsection .altinstructions,"a"\n" \ + ALTINSTR_ENTRY(facility, 1) \ + ".popsection\n" + +#define ALTERNATIVE_2(oldinstr, altinstr1, facility1, altinstr2, facility2)\ + ".pushsection .altinstr_replacement, "ax"\n" \ + ALTINSTR_REPLACEMENT(altinstr1, 1) \ + ALTINSTR_REPLACEMENT(altinstr2, 2) \ + ".popsection\n" \ + OLDINSTR_2(oldinstr, 1, 2) \ + ".pushsection .altinstructions,"a"\n" \ + ALTINSTR_ENTRY(facility1, 1) \ + ALTINSTR_ENTRY(facility2, 2) \ + ".popsection\n" +#else +/* Alternative instructions are disabled, let's put just oldinstr in */ +#define ALTERNATIVE(oldinstr, altinstr, facility) \ + oldinstr "\n" + +#define ALTERNATIVE_2(oldinstr, altinstr1, facility1, altinstr2, facility2) \ + oldinstr "\n" +#endif + +/* + * Alternative instructions for different CPU types or capabilities. + * + * This allows to use optimized instructions even on generic binary + * kernels. + * + * oldinstr is padded with jump and nops at compile time if altinstr is + * longer. altinstr is padded with jump and nops at run-time during patching. + * + * For non barrier like inlines please define new variants + * without volatile and memory clobber. + */ +#define alternative(oldinstr, altinstr, facility) \ + asm volatile(ALTERNATIVE(oldinstr, altinstr, facility) : : : "memory") + +#define alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2) \ + asm volatile(ALTERNATIVE_2(oldinstr, altinstr1, facility1, \ + altinstr2, facility2) ::: "memory") + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_S390_ALTERNATIVE_H */ --- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -75,6 +75,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_UPROBES) += uprobes.o +obj-$(CONFIG_ALTERNATIVES) += alternative.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_cpum_cf.o perf_cpum_sf.o obj-$(CONFIG_PERF_EVENTS) += perf_cpum_cf_events.o --- /dev/null +++ b/arch/s390/kernel/alternative.c @@ -0,0 +1,110 @@ +#include <linux/module.h> +#include <asm/alternative.h> +#include <asm/facility.h> + +#define MAX_PATCH_LEN (255 - 1) + +static int __initdata_or_module alt_instr_disabled; + +static int __init disable_alternative_instructions(char *str) +{ + alt_instr_disabled = 1; + return 0; +} + +early_param("noaltinstr", disable_alternative_instructions); + +struct brcl_insn { + u16 opc; + s32 disp; +} __packed; + +static u16 __initdata_or_module nop16 = 0x0700; +static u32 __initdata_or_module nop32 = 0x47000000; +static struct brcl_insn __initdata_or_module nop48 = { + 0xc004, 0 +}; + +static const void *nops[] __initdata_or_module = { + &nop16, + &nop32, + &nop48 +}; + +static void __init_or_module add_jump_padding(void *insns, unsigned int len) +{ + struct brcl_insn brcl = { + 0xc0f4, + len / 2 + }; + + memcpy(insns, &brcl, sizeof(brcl)); + insns += sizeof(brcl); + len -= sizeof(brcl); + + while (len > 0) { + memcpy(insns, &nop16, 2); + insns += 2; + len -= 2; + } +} + +static void __init_or_module add_padding(void *insns, unsigned int len) +{ + if (len > 6) + add_jump_padding(insns, len); + else if (len >= 2) + memcpy(insns, nops[len / 2 - 1], len); +} + +static void __init_or_module __apply_alternatives(struct alt_instr *start, + struct alt_instr *end) +{ + struct alt_instr *a; + u8 *instr, *replacement; + u8 insnbuf[MAX_PATCH_LEN]; + + /* + * The scan order should be from start to end. A later scanned + * alternative code can overwrite previously scanned alternative code. + */ + for (a = start; a < end; a++) { + int insnbuf_sz = 0; + + instr = (u8 *)&a->instr_offset + a->instr_offset; + replacement = (u8 *)&a->repl_offset + a->repl_offset; + + if (!test_facility(a->facility)) + continue; + + if (unlikely(a->instrlen % 2 || a->replacementlen % 2)) { + WARN_ONCE(1, "cpu alternatives instructions length is " + "odd, skipping patching\n"); + continue; + } + + memcpy(insnbuf, replacement, a->replacementlen); + insnbuf_sz = a->replacementlen; + + if (a->instrlen > a->replacementlen) { + add_padding(insnbuf + a->replacementlen, + a->instrlen - a->replacementlen); + insnbuf_sz += a->instrlen - a->replacementlen; + } + + s390_kernel_write(instr, insnbuf, insnbuf_sz); + } +} + +void __init_or_module apply_alternatives(struct alt_instr *start, + struct alt_instr *end) +{ + if (!alt_instr_disabled) + __apply_alternatives(start, end); +} + +extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; +void __init apply_alternative_instructions(void) +{ + apply_alternatives(__alt_instructions, __alt_instructions_end); +} --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -31,6 +31,7 @@ #include <linux/kernel.h> #include <linux/moduleloader.h> #include <linux/bug.h> +#include <asm/alternative.h>
#if 0 #define DEBUGP printk @@ -429,6 +430,22 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *me) { + const Elf_Shdr *s; + char *secstrings; + + if (IS_ENABLED(CONFIG_ALTERNATIVES)) { + secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; + for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { + if (!strcmp(".altinstructions", + secstrings + s->sh_name)) { + /* patch .altinstructions */ + void *aseg = (void *)s->sh_addr; + + apply_alternatives(aseg, aseg + s->sh_size); + } + } + } + jump_label_apply_nops(me); return 0; } --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -66,6 +66,7 @@ #include <asm/sclp.h> #include <asm/sysinfo.h> #include <asm/numa.h> +#include <asm/alternative.h> #include "entry.h"
/* @@ -955,6 +956,8 @@ void __init setup_arch(char **cmdline_p) conmode_default(); set_preferred_console();
+ apply_alternative_instructions(); + /* Setup zfcpdump support */ setup_zfcpdump();
--- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -105,6 +105,29 @@ SECTIONS EXIT_DATA }
+ /* + * struct alt_inst entries. From the header (alternative.h): + * "Alternative instructions for different CPU types or capabilities" + * Think locking instructions on spinlocks. + * Note, that it is a part of __init region. + */ + . = ALIGN(8); + .altinstructions : { + __alt_instructions = .; + *(.altinstructions) + __alt_instructions_end = .; + } + + /* + * And here are the replacement instructions. The linker sticks + * them as binary blobs. The .altinstructions has enough data to + * get the address and the length of them to patch the kernel safely. + * Note, that it is a part of __init region. + */ + .altinstr_replacement : { + *(.altinstr_replacement) + } + /* early.c uses stsi, which requires page aligned data. */ . = ALIGN(PAGE_SIZE); INIT_DATA_SECTION(0x100)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Heiko Carstens heiko.carstens@de.ibm.com
[ Upstream commit 049a2c2d486e8cc82c5cd79fa479c5b105b109e9 ]
Remove the CPU_ALTERNATIVES config option and enable the code unconditionally. The config option was only added to avoid a conflict with the named saved segment support. Since that code is gone there is no reason to keep the CPU_ALTERNATIVES config option.
Just enable it unconditionally to also reduce the number of config options and make it less likely that something breaks.
Signed-off-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 16 ---------------- arch/s390/include/asm/alternative.h | 20 +++----------------- arch/s390/kernel/Makefile | 3 +-- arch/s390/kernel/module.c | 15 ++++++--------- 4 files changed, 10 insertions(+), 44 deletions(-)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -538,22 +538,6 @@ config ARCH_RANDOM
If unsure, say Y.
-config ALTERNATIVES - def_bool y - prompt "Patch optimized instructions for running CPU type" - help - When enabled the kernel code is compiled with additional - alternative instructions blocks optimized for newer CPU types. - These alternative instructions blocks are patched at kernel boot - time when running CPU supports them. This mechanism is used to - optimize some critical code paths (i.e. spinlocks) for newer CPUs - even if kernel is build to support older machine generations. - - This mechanism could be disabled by appending "noaltinstr" - option to the kernel command line. - - If unsure, say Y. - endmenu
menu "Memory setup" --- a/arch/s390/include/asm/alternative.h +++ b/arch/s390/include/asm/alternative.h @@ -15,14 +15,9 @@ struct alt_instr { u8 replacementlen; /* length of new instruction */ } __packed;
-#ifdef CONFIG_ALTERNATIVES -extern void apply_alternative_instructions(void); -extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); -#else -static inline void apply_alternative_instructions(void) {}; -static inline void apply_alternatives(struct alt_instr *start, - struct alt_instr *end) {}; -#endif +void apply_alternative_instructions(void); +void apply_alternatives(struct alt_instr *start, struct alt_instr *end); + /* * |661: |662: |6620 |663: * +-----------+---------------------+ @@ -109,7 +104,6 @@ static inline void apply_alternatives(st b_altinstr(num)":\n\t" altinstr "\n" e_altinstr(num) ":\n" \ INSTR_LEN_SANITY_CHECK(altinstr_len(num))
-#ifdef CONFIG_ALTERNATIVES /* alternative assembly primitive: */ #define ALTERNATIVE(oldinstr, altinstr, facility) \ ".pushsection .altinstr_replacement, "ax"\n" \ @@ -130,14 +124,6 @@ static inline void apply_alternatives(st ALTINSTR_ENTRY(facility1, 1) \ ALTINSTR_ENTRY(facility2, 2) \ ".popsection\n" -#else -/* Alternative instructions are disabled, let's put just oldinstr in */ -#define ALTERNATIVE(oldinstr, altinstr, facility) \ - oldinstr "\n" - -#define ALTERNATIVE_2(oldinstr, altinstr1, facility1, altinstr2, facility2) \ - oldinstr "\n" -#endif
/* * Alternative instructions for different CPU types or capabilities. --- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -57,7 +57,7 @@ obj-y += processor.o sys_s390.o ptrace.o obj-y += debug.o irq.o ipl.o dis.o diag.o vdso.o als.o obj-y += sysinfo.o jump_label.o lgr.o os_info.o machine_kexec.o pgm_check.o obj-y += runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o -obj-y += entry.o reipl.o relocate_kernel.o kdebugfs.o +obj-y += entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o
extra-y += head.o head64.o vmlinux.lds
@@ -75,7 +75,6 @@ obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_UPROBES) += uprobes.o -obj-$(CONFIG_ALTERNATIVES) += alternative.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_cpum_cf.o perf_cpum_sf.o obj-$(CONFIG_PERF_EVENTS) += perf_cpum_cf_events.o --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -433,16 +433,13 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *s; char *secstrings;
- if (IS_ENABLED(CONFIG_ALTERNATIVES)) { - secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; - for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { - if (!strcmp(".altinstructions", - secstrings + s->sh_name)) { - /* patch .altinstructions */ - void *aseg = (void *)s->sh_addr; + secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; + for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { + if (!strcmp(".altinstructions", secstrings + s->sh_name)) { + /* patch .altinstructions */ + void *aseg = (void *)s->sh_addr;
- apply_alternatives(aseg, aseg + s->sh_size); - } + apply_alternatives(aseg, aseg + s->sh_size); } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 7041d28115e91f2144f811ffe8a195c696b1e1d0 ]
Clear all user space registers on entry to the kernel and all KVM guest registers on KVM guest exit if the register does not contain either a parameter or a result value.
Reviewed-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/entry.S | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -248,6 +248,12 @@ ENTRY(sie64a) sie_exit: lg %r14,__SF_EMPTY+8(%r15) # load guest register save area stmg %r0,%r13,0(%r14) # save guest gprs 0-13 + xgr %r0,%r0 # clear guest registers to + xgr %r1,%r1 # prevent speculative use + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers lg %r2,__SF_EMPTY+16(%r15) # return exit reason code br %r14 @@ -282,6 +288,8 @@ ENTRY(system_call) .Lsysc_vtime: UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled register to prevent speculative use + xgr %r0,%r0 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC @@ -550,6 +558,15 @@ ENTRY(pgm_check_handler) 3: stg %r10,__THREAD_last_break(%r14) 4: la %r11,STACK_FRAME_OVERHEAD(%r15) stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled registers to prevent speculative use + xgr %r0,%r0 + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC stmg %r8,%r9,__PT_PSW(%r11) mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC @@ -615,6 +632,16 @@ ENTRY(io_int_handler) lmg %r8,%r9,__LC_IO_OLD_PSW SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled registers to prevent speculative use + xgr %r0,%r0 + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC stmg %r8,%r9,__PT_PSW(%r11) mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID @@ -820,6 +847,16 @@ ENTRY(ext_int_handler) lmg %r8,%r9,__LC_EXT_OLD_PSW SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled registers to prevent speculative use + xgr %r0,%r0 + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC stmg %r8,%r9,__PT_PSW(%r11) lghi %r1,__LC_EXT_PARAMS2 @@ -982,6 +1019,16 @@ ENTRY(mcck_int_handler) .Lmcck_skip: lghi %r14,__LC_GPREGS_SAVE_AREA+64 stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled registers to prevent speculative use + xgr %r0,%r0 + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),0(%r14) stmg %r8,%r9,__PT_PSW(%r11) xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit e2dd833389cc4069a96b57bdd24227b5f52288f5 ]
Add an optimized version of the array_index_mask_nospec function for s390 based on a compare and a subtract with borrow.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/barrier.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
--- a/arch/s390/include/asm/barrier.h +++ b/arch/s390/include/asm/barrier.h @@ -49,6 +49,30 @@ do { \ #define __smp_mb__before_atomic() barrier() #define __smp_mb__after_atomic() barrier()
+/** + * array_index_mask_nospec - generate a mask for array_idx() that is + * ~0UL when the bounds check succeeds and 0 otherwise + * @index: array element index + * @size: number of elements in array + */ +#define array_index_mask_nospec array_index_mask_nospec +static inline unsigned long array_index_mask_nospec(unsigned long index, + unsigned long size) +{ + unsigned long mask; + + if (__builtin_constant_p(size) && size > 0) { + asm(" clgr %2,%1\n" + " slbgr %0,%0\n" + :"=d" (mask) : "d" (size-1), "d" (index) :"cc"); + return mask; + } + asm(" clgr %1,%2\n" + " slbgr %0,%0\n" + :"=d" (mask) : "d" (size), "d" (index) :"cc"); + return ~mask; +} + #include <asm-generic/barrier.h>
#endif /* __ASM_BARRIER_H */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters make a copy of the facility bit mask provided by STFLE and use the copy for the decision to apply an alternative.
Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/facility.h | 18 ++++++++++++++++++ arch/s390/include/asm/lowcore.h | 3 ++- arch/s390/kernel/alternative.c | 3 ++- arch/s390/kernel/early.c | 3 +++ arch/s390/kernel/setup.c | 4 +++- arch/s390/kernel/smp.c | 4 +++- 6 files changed, 31 insertions(+), 4 deletions(-)
--- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -15,6 +15,24 @@
#define MAX_FACILITY_BIT (sizeof(((struct lowcore *)0)->stfle_fac_list) * 8)
+static inline void __set_facility(unsigned long nr, void *facilities) +{ + unsigned char *ptr = (unsigned char *) facilities; + + if (nr >= MAX_FACILITY_BIT) + return; + ptr[nr >> 3] |= 0x80 >> (nr & 7); +} + +static inline void __clear_facility(unsigned long nr, void *facilities) +{ + unsigned char *ptr = (unsigned char *) facilities; + + if (nr >= MAX_FACILITY_BIT) + return; + ptr[nr >> 3] &= ~(0x80 >> (nr & 7)); +} + static inline int __test_facility(unsigned long nr, void *facilities) { unsigned char *ptr; --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -155,7 +155,8 @@ struct lowcore { __u8 pad_0x0e20[0x0f00-0x0e20]; /* 0x0e20 */
/* Extended facility list */ - __u64 stfle_fac_list[32]; /* 0x0f00 */ + __u64 stfle_fac_list[16]; /* 0x0f00 */ + __u64 alt_stfle_fac_list[16]; /* 0x0f80 */ __u8 pad_0x1000[0x11b0-0x1000]; /* 0x1000 */
/* Pointer to the machine check extended save area */ --- a/arch/s390/kernel/alternative.c +++ b/arch/s390/kernel/alternative.c @@ -74,7 +74,8 @@ static void __init_or_module __apply_alt instr = (u8 *)&a->instr_offset + a->instr_offset; replacement = (u8 *)&a->repl_offset + a->repl_offset;
- if (!test_facility(a->facility)) + if (!__test_facility(a->facility, + S390_lowcore.alt_stfle_fac_list)) continue;
if (unlikely(a->instrlen % 2 || a->replacementlen % 2)) { --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -329,6 +329,9 @@ static noinline __init void setup_facili { stfle(S390_lowcore.stfle_fac_list, ARRAY_SIZE(S390_lowcore.stfle_fac_list)); + memcpy(S390_lowcore.alt_stfle_fac_list, + S390_lowcore.stfle_fac_list, + sizeof(S390_lowcore.alt_stfle_fac_list)); }
static __init void detect_diag9c(void) --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -339,7 +339,9 @@ static void __init setup_lowcore(void) lc->preempt_count = S390_lowcore.preempt_count; lc->stfl_fac_list = S390_lowcore.stfl_fac_list; memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list, - MAX_FACILITY_BIT/8); + sizeof(lc->stfle_fac_list)); + memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list, + sizeof(lc->alt_stfle_fac_list)); if (MACHINE_HAS_VX || MACHINE_HAS_GS) { unsigned long bits, size;
--- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -282,7 +282,9 @@ static void pcpu_prepare_secondary(struc __ctl_store(lc->cregs_save_area, 0, 15); save_access_regs((unsigned int *) lc->access_regs_save_area); memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list, - MAX_FACILITY_BIT/8); + sizeof(lc->stfle_fac_list)); + memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list, + sizeof(lc->alt_stfle_fac_list)); }
static void pcpu_attach_task(struct pcpu *pcpu, struct task_struct *tsk)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit d768bd892fc8f066cd3aa000eb1867bcf32db0ee ]
Add the PPA instruction to the system entry and exit path to switch the kernel to a different branch prediction behaviour. The instructions are added via CPU alternatives and can be disabled with the "nospec" or the "nobp=0" kernel parameter. If the default behaviour selected with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be used to enable the changed kernel branch prediction.
Acked-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 17 +++++++++++++ arch/s390/include/asm/processor.h | 1 arch/s390/kernel/alternative.c | 23 ++++++++++++++++++ arch/s390/kernel/early.c | 2 + arch/s390/kernel/entry.S | 48 ++++++++++++++++++++++++++++++++++++++ arch/s390/kernel/ipl.c | 1 arch/s390/kernel/smp.c | 2 + 7 files changed, 94 insertions(+)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -538,6 +538,23 @@ config ARCH_RANDOM
If unsure, say Y.
+config KERNEL_NOBP + def_bool n + prompt "Enable modified branch prediction for the kernel by default" + help + If this option is selected the kernel will switch to a modified + branch prediction mode if the firmware interface is available. + The modified branch prediction mode improves the behaviour in + regard to speculative execution. + + With the option enabled the kernel parameter "nobp=0" or "nospec" + can be used to run the kernel in the normal branch prediction mode. + + With the option disabled the modified branch prediction mode is + enabled with the "nobp=1" kernel parameter. + + If unsure, say N. + endmenu
menu "Memory setup" --- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -89,6 +89,7 @@ void cpu_detect_mhz_feature(void); extern const struct seq_operations cpuinfo_op; extern int sysctl_ieee_emulation_warnings; extern void execve_tail(void); +extern void __bpon(void);
/* * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit. --- a/arch/s390/kernel/alternative.c +++ b/arch/s390/kernel/alternative.c @@ -14,6 +14,29 @@ static int __init disable_alternative_in
early_param("noaltinstr", disable_alternative_instructions);
+static int __init nobp_setup_early(char *str) +{ + bool enabled; + int rc; + + rc = kstrtobool(str, &enabled); + if (rc) + return rc; + if (enabled && test_facility(82)) + __set_facility(82, S390_lowcore.alt_stfle_fac_list); + else + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + return 0; +} +early_param("nobp", nobp_setup_early); + +static int __init nospec_setup_early(char *str) +{ + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + return 0; +} +early_param("nospec", nospec_setup_early); + struct brcl_insn { u16 opc; s32 disp; --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -332,6 +332,8 @@ static noinline __init void setup_facili memcpy(S390_lowcore.alt_stfle_fac_list, S390_lowcore.stfle_fac_list, sizeof(S390_lowcore.alt_stfle_fac_list)); + if (!IS_ENABLED(CONFIG_KERNEL_NOBP)) + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); }
static __init void detect_diag9c(void) --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -158,6 +158,34 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCAL tm off+\addr, \mask .endm
+ .macro BPOFF + .pushsection .altinstr_replacement, "ax" +660: .long 0xb2e8c000 + .popsection +661: .long 0x47000000 + .pushsection .altinstructions, "a" + .long 661b - . + .long 660b - . + .word 82 + .byte 4 + .byte 4 + .popsection + .endm + + .macro BPON + .pushsection .altinstr_replacement, "ax" +662: .long 0xb2e8d000 + .popsection +663: .long 0x47000000 + .pushsection .altinstructions, "a" + .long 663b - . + .long 662b - . + .word 82 + .byte 4 + .byte 4 + .popsection + .endm + .section .kprobes.text, "ax" .Ldummy: /* @@ -170,6 +198,11 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCAL */ nop 0
+ENTRY(__bpon) + .globl __bpon + BPON + br %r14 + /* * Scheduler resume function, called by switch_to * gpr2 = (task_struct *) prev @@ -226,8 +259,11 @@ ENTRY(sie64a) jnz .Lsie_skip TSTMSK __LC_CPU_FLAGS,_CIF_FPU jo .Lsie_skip # exit if fp/vx regs changed + BPON .Lsie_entry: sie 0(%r14) +.Lsie_exit: + BPOFF .Lsie_skip: ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce @@ -279,6 +315,7 @@ ENTRY(system_call) stpt __LC_SYNC_ENTER_TIMER .Lsysc_stmg: stmg %r8,%r15,__LC_SAVE_AREA_SYNC + BPOFF lg %r12,__LC_CURRENT lghi %r13,__TASK_thread lghi %r14,_PIF_SYSCALL @@ -325,6 +362,7 @@ ENTRY(system_call) jnz .Lsysc_work # check for work TSTMSK __LC_CPU_FLAGS,_CIF_WORK jnz .Lsysc_work + BPON .Lsysc_restore: lg %r14,__LC_VDSO_PER_CPU lmg %r0,%r10,__PT_R0(%r11) @@ -522,6 +560,7 @@ ENTRY(kernel_thread_starter)
ENTRY(pgm_check_handler) stpt __LC_SYNC_ENTER_TIMER + BPOFF stmg %r8,%r15,__LC_SAVE_AREA_SYNC lg %r10,__LC_LAST_BREAK lg %r12,__LC_CURRENT @@ -626,6 +665,7 @@ ENTRY(pgm_check_handler) ENTRY(io_int_handler) STCK __LC_INT_CLOCK stpt __LC_ASYNC_ENTER_TIMER + BPOFF stmg %r8,%r15,__LC_SAVE_AREA_ASYNC lg %r12,__LC_CURRENT larl %r13,cleanup_critical @@ -676,9 +716,13 @@ ENTRY(io_int_handler) lg %r14,__LC_VDSO_PER_CPU lmg %r0,%r10,__PT_R0(%r11) mvc __LC_RETURN_PSW(16),__PT_PSW(%r11) + tm __PT_PSW+1(%r11),0x01 # returning to user ? + jno .Lio_exit_kernel + BPON .Lio_exit_timer: stpt __LC_EXIT_TIMER mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER +.Lio_exit_kernel: lmg %r11,%r15,__PT_R11(%r11) lpswe __LC_RETURN_PSW .Lio_done: @@ -841,6 +885,7 @@ ENTRY(io_int_handler) ENTRY(ext_int_handler) STCK __LC_INT_CLOCK stpt __LC_ASYNC_ENTER_TIMER + BPOFF stmg %r8,%r15,__LC_SAVE_AREA_ASYNC lg %r12,__LC_CURRENT larl %r13,cleanup_critical @@ -889,6 +934,7 @@ ENTRY(psw_idle) .Lpsw_idle_stcctm: #endif oi __LC_CPU_FLAGS+7,_CIF_ENABLED_WAIT + BPON STCK __CLOCK_IDLE_ENTER(%r2) stpt __TIMER_IDLE_ENTER(%r2) .Lpsw_idle_lpsw: @@ -989,6 +1035,7 @@ load_fpu_regs: */ ENTRY(mcck_int_handler) STCK __LC_MCCK_CLOCK + BPOFF la %r1,4095 # revalidate r1 spt __LC_CPU_TIMER_SAVE_AREA-4095(%r1) # revalidate cpu timer lmg %r0,%r15,__LC_GPREGS_SAVE_AREA-4095(%r1)# revalidate gprs @@ -1054,6 +1101,7 @@ ENTRY(mcck_int_handler) mvc __LC_RETURN_MCCK_PSW(16),__PT_PSW(%r11) # move return PSW tm __LC_RETURN_MCCK_PSW+1,0x01 # returning to user ? jno 0f + BPON stpt __LC_EXIT_TIMER mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER 0: lmg %r11,%r15,__PT_R11(%r11) --- a/arch/s390/kernel/ipl.c +++ b/arch/s390/kernel/ipl.c @@ -564,6 +564,7 @@ static struct kset *ipl_kset;
static void __ipl_run(void *unused) { + __bpon(); diag308(DIAG308_LOAD_CLEAR, NULL); if (MACHINE_IS_VM) __cpcmd("IPL", NULL, 0, NULL); --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -334,6 +334,7 @@ static void pcpu_delegate(struct pcpu *p mem_assign_absolute(lc->restart_fn, (unsigned long) func); mem_assign_absolute(lc->restart_data, (unsigned long) data); mem_assign_absolute(lc->restart_source, source_cpu); + __bpon(); asm volatile( "0: sigp 0,%0,%2 # sigp restart to target cpu\n" " brc 2,0b # busy, try again\n" @@ -909,6 +910,7 @@ void __cpu_die(unsigned int cpu) void __noreturn cpu_die(void) { idle_task_exit(); + __bpon(); pcpu_sigp_retry(pcpu_devices + smp_processor_id(), SIGP_STOP, 0); for (;;) ; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6b73044b2b0081ee3dd1cd6eaab7dee552601efb ]
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary plumbing in entry.S to be able to run user space and KVM guests with limited branch prediction.
To switch a user space process to limited branch prediction the s390_isolate_bp() function has to be call, and to run a vCPU of a KVM guest associated with the current task with limited branch prediction call s390_isolate_bp_guest().
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/processor.h | 3 ++ arch/s390/include/asm/thread_info.h | 4 ++ arch/s390/kernel/entry.S | 51 ++++++++++++++++++++++++++++++++---- arch/s390/kernel/processor.c | 18 ++++++++++++ 4 files changed, 71 insertions(+), 5 deletions(-)
--- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -378,6 +378,9 @@ extern void memcpy_absolute(void *, void memcpy_absolute(&(dest), &__tmp, sizeof(__tmp)); \ } while (0)
+extern int s390_isolate_bp(void); +extern int s390_isolate_bp_guest(void); + #endif /* __ASSEMBLY__ */
#endif /* __ASM_S390_PROCESSOR_H */ --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -60,6 +60,8 @@ int arch_dup_task_struct(struct task_str #define TIF_GUARDED_STORAGE 4 /* load guarded storage control block */ #define TIF_PATCH_PENDING 5 /* pending live patching update */ #define TIF_PGSTE 6 /* New mm's will use 4K page tables */ +#define TIF_ISOLATE_BP 8 /* Run process with isolated BP */ +#define TIF_ISOLATE_BP_GUEST 9 /* Run KVM guests with isolated BP */
#define TIF_31BIT 16 /* 32bit process */ #define TIF_MEMDIE 17 /* is terminating due to OOM killer */ @@ -80,6 +82,8 @@ int arch_dup_task_struct(struct task_str #define _TIF_UPROBE _BITUL(TIF_UPROBE) #define _TIF_GUARDED_STORAGE _BITUL(TIF_GUARDED_STORAGE) #define _TIF_PATCH_PENDING _BITUL(TIF_PATCH_PENDING) +#define _TIF_ISOLATE_BP _BITUL(TIF_ISOLATE_BP) +#define _TIF_ISOLATE_BP_GUEST _BITUL(TIF_ISOLATE_BP_GUEST)
#define _TIF_31BIT _BITUL(TIF_31BIT) #define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP) --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -106,6 +106,7 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCAL aghi %r15,-(STACK_FRAME_OVERHEAD + __PT_SIZE) j 3f 1: UPDATE_VTIME %r14,%r15,\timer + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP 2: lg %r15,__LC_ASYNC_STACK # load async stack 3: la %r11,STACK_FRAME_OVERHEAD(%r15) .endm @@ -186,6 +187,40 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCAL .popsection .endm
+ .macro BPENTER tif_ptr,tif_mask + .pushsection .altinstr_replacement, "ax" +662: .word 0xc004, 0x0000, 0x0000 # 6 byte nop + .word 0xc004, 0x0000, 0x0000 # 6 byte nop + .popsection +664: TSTMSK \tif_ptr,\tif_mask + jz . + 8 + .long 0xb2e8d000 + .pushsection .altinstructions, "a" + .long 664b - . + .long 662b - . + .word 82 + .byte 12 + .byte 12 + .popsection + .endm + + .macro BPEXIT tif_ptr,tif_mask + TSTMSK \tif_ptr,\tif_mask + .pushsection .altinstr_replacement, "ax" +662: jnz . + 8 + .long 0xb2e8d000 + .popsection +664: jz . + 8 + .long 0xb2e8c000 + .pushsection .altinstructions, "a" + .long 664b - . + .long 662b - . + .word 82 + .byte 8 + .byte 8 + .popsection + .endm + .section .kprobes.text, "ax" .Ldummy: /* @@ -240,9 +275,11 @@ ENTRY(__switch_to) */ ENTRY(sie64a) stmg %r6,%r14,__SF_GPRS(%r15) # save kernel registers + lg %r12,__LC_CURRENT stg %r2,__SF_EMPTY(%r15) # save control block pointer stg %r3,__SF_EMPTY+8(%r15) # save guest register save area xc __SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # reason code = 0 + mvc __SF_EMPTY+24(8,%r15),__TI_flags(%r12) # copy thread flags TSTMSK __LC_CPU_FLAGS,_CIF_FPU # load guest fp/vx registers ? jno .Lsie_load_guest_gprs brasl %r14,load_fpu_regs # load guest fp/vx regs @@ -259,11 +296,12 @@ ENTRY(sie64a) jnz .Lsie_skip TSTMSK __LC_CPU_FLAGS,_CIF_FPU jo .Lsie_skip # exit if fp/vx regs changed - BPON + BPEXIT __SF_EMPTY+24(%r15),(_TIF_ISOLATE_BP|_TIF_ISOLATE_BP_GUEST) .Lsie_entry: sie 0(%r14) .Lsie_exit: BPOFF + BPENTER __SF_EMPTY+24(%r15),(_TIF_ISOLATE_BP|_TIF_ISOLATE_BP_GUEST) .Lsie_skip: ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce @@ -324,6 +362,7 @@ ENTRY(system_call) la %r11,STACK_FRAME_OVERHEAD(%r15) # pointer to pt_regs .Lsysc_vtime: UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP stmg %r0,%r7,__PT_R0(%r11) # clear user controlled register to prevent speculative use xgr %r0,%r0 @@ -362,7 +401,7 @@ ENTRY(system_call) jnz .Lsysc_work # check for work TSTMSK __LC_CPU_FLAGS,_CIF_WORK jnz .Lsysc_work - BPON + BPEXIT __TI_flags(%r12),_TIF_ISOLATE_BP .Lsysc_restore: lg %r14,__LC_VDSO_PER_CPU lmg %r0,%r10,__PT_R0(%r11) @@ -587,6 +626,7 @@ ENTRY(pgm_check_handler) aghi %r15,-(STACK_FRAME_OVERHEAD + __PT_SIZE) j 4f 2: UPDATE_VTIME %r14,%r15,__LC_SYNC_ENTER_TIMER + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP lg %r15,__LC_KERNEL_STACK lgr %r14,%r12 aghi %r14,__TASK_thread # pointer to thread_struct @@ -718,7 +758,7 @@ ENTRY(io_int_handler) mvc __LC_RETURN_PSW(16),__PT_PSW(%r11) tm __PT_PSW+1(%r11),0x01 # returning to user ? jno .Lio_exit_kernel - BPON + BPEXIT __TI_flags(%r12),_TIF_ISOLATE_BP .Lio_exit_timer: stpt __LC_EXIT_TIMER mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER @@ -1101,7 +1141,7 @@ ENTRY(mcck_int_handler) mvc __LC_RETURN_MCCK_PSW(16),__PT_PSW(%r11) # move return PSW tm __LC_RETURN_MCCK_PSW+1,0x01 # returning to user ? jno 0f - BPON + BPEXIT __TI_flags(%r12),_TIF_ISOLATE_BP stpt __LC_EXIT_TIMER mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER 0: lmg %r11,%r15,__PT_R11(%r11) @@ -1228,7 +1268,8 @@ cleanup_critical: clg %r9,BASED(.Lsie_crit_mcck_length) jh 1f oi __LC_CPU_FLAGS+7, _CIF_MCCK_GUEST -1: lg %r9,__SF_EMPTY(%r15) # get control block pointer +1: BPENTER __SF_EMPTY+24(%r15),(_TIF_ISOLATE_BP|_TIF_ISOLATE_BP_GUEST) + lg %r9,__SF_EMPTY(%r15) # get control block pointer ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce larl %r9,sie_exit # skip forward to sie_exit --- a/arch/s390/kernel/processor.c +++ b/arch/s390/kernel/processor.c @@ -197,3 +197,21 @@ const struct seq_operations cpuinfo_op = .stop = c_stop, .show = show_cpuinfo, }; + +int s390_isolate_bp(void) +{ + if (!test_facility(82)) + return -EOPNOTSUPP; + set_thread_flag(TIF_ISOLATE_BP); + return 0; +} +EXPORT_SYMBOL(s390_isolate_bp); + +int s390_isolate_bp_guest(void) +{ + if (!test_facility(82)) + return -EOPNOTSUPP; + set_thread_flag(TIF_ISOLATE_BP_GUEST); + return 0; +} +EXPORT_SYMBOL(s390_isolate_bp_guest);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit f19fbd5ed642dc31c809596412dab1ed56f2f156 ]
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and -mfunction_return= compiler options to create a kernel fortified against the specte v2 attack.
With CONFIG_EXPOLINE=y all indirect branches will be issued with an execute type instruction. For z10 or newer the EXRL instruction will be used, for older machines the EX instruction. The typical indirect call
basr %r14,%r1
is replaced with a PC relative call to a new thunk
brasl %r14,__s390x_indirect_jump_r1
The thunk contains the EXRL/EX instruction to the indirect branch
__s390x_indirect_jump_r1: exrl 0,0f j . 0: br %r1
The detour via the execute type instruction has a performance impact. To get rid of the detour the new kernel parameter "nospectre_v2" and "spectre_v2=[on,off,auto]" can be used. If the parameter is specified the kernel and module code will be patched at runtime.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 28 ++++++++ arch/s390/Makefile | 10 +++ arch/s390/include/asm/lowcore.h | 4 - arch/s390/include/asm/nospec-branch.h | 18 +++++ arch/s390/kernel/Makefile | 4 + arch/s390/kernel/entry.S | 113 ++++++++++++++++++++++++++-------- arch/s390/kernel/module.c | 62 +++++++++++++++--- arch/s390/kernel/nospec-branch.c | 101 ++++++++++++++++++++++++++++++ arch/s390/kernel/setup.c | 4 + arch/s390/kernel/smp.c | 1 arch/s390/kernel/vmlinux.lds.S | 14 ++++ drivers/s390/char/Makefile | 2 12 files changed, 326 insertions(+), 35 deletions(-) create mode 100644 arch/s390/include/asm/nospec-branch.h create mode 100644 arch/s390/kernel/nospec-branch.c
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -555,6 +555,34 @@ config KERNEL_NOBP
If unsure, say N.
+config EXPOLINE + def_bool n + prompt "Avoid speculative indirect branches in the kernel" + help + Compile the kernel with the expoline compiler options to guard + against kernel-to-user data leaks by avoiding speculative indirect + branches. + Requires a compiler with -mindirect-branch=thunk support for full + protection. The kernel may run slower. + + If unsure, say N. + +choice + prompt "Expoline default" + depends on EXPOLINE + default EXPOLINE_FULL + +config EXPOLINE_OFF + bool "spectre_v2=off" + +config EXPOLINE_MEDIUM + bool "spectre_v2=auto" + +config EXPOLINE_FULL + bool "spectre_v2=on" + +endchoice + endmenu
menu "Memory setup" --- a/arch/s390/Makefile +++ b/arch/s390/Makefile @@ -81,6 +81,16 @@ ifeq ($(call cc-option-yn,-mwarn-dynamic cflags-$(CONFIG_WARN_DYNAMIC_STACK) += -mwarn-dynamicstack endif
+ifdef CONFIG_EXPOLINE + ifeq ($(call cc-option-yn,$(CC_FLAGS_MARCH) -mindirect-branch=thunk),y) + CC_FLAGS_EXPOLINE := -mindirect-branch=thunk + CC_FLAGS_EXPOLINE += -mfunction-return=thunk + CC_FLAGS_EXPOLINE += -mindirect-branch-table + export CC_FLAGS_EXPOLINE + cflags-y += $(CC_FLAGS_EXPOLINE) + endif +endif + ifdef CONFIG_FUNCTION_TRACER # make use of hotpatch feature if the compiler supports it cc_hotpatch := -mhotpatch=0,3 --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -140,7 +140,9 @@ struct lowcore { /* Per cpu primary space access list */ __u32 paste[16]; /* 0x0400 */
- __u8 pad_0x04c0[0x0e00-0x0440]; /* 0x0440 */ + /* br %r1 trampoline */ + __u16 br_r1_trampoline; /* 0x0440 */ + __u8 pad_0x0442[0x0e00-0x0442]; /* 0x0442 */
/* * 0xe00 contains the address of the IPL Parameter Information --- /dev/null +++ b/arch/s390/include/asm/nospec-branch.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_S390_EXPOLINE_H +#define _ASM_S390_EXPOLINE_H + +#ifndef __ASSEMBLY__ + +#include <linux/types.h> + +extern int nospec_call_disable; +extern int nospec_return_disable; + +void nospec_init_branches(void); +void nospec_call_revert(s32 *start, s32 *end); +void nospec_return_revert(s32 *start, s32 *end); + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_S390_EXPOLINE_H */ --- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -29,6 +29,7 @@ UBSAN_SANITIZE_early.o := n # ifneq ($(CC_FLAGS_MARCH),-march=z900) CFLAGS_REMOVE_als.o += $(CC_FLAGS_MARCH) +CFLAGS_REMOVE_als.o += $(CC_FLAGS_EXPOLINE) CFLAGS_als.o += -march=z900 AFLAGS_REMOVE_head.o += $(CC_FLAGS_MARCH) AFLAGS_head.o += -march=z900 @@ -61,6 +62,9 @@ obj-y += entry.o reipl.o relocate_kernel
extra-y += head.o head64.o vmlinux.lds
+obj-$(CONFIG_EXPOLINE) += nospec-branch.o +CFLAGS_REMOVE_expoline.o += $(CC_FLAGS_EXPOLINE) + obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_SCHED_TOPOLOGY) += topology.o --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -221,6 +221,68 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCAL .popsection .endm
+#ifdef CONFIG_EXPOLINE + + .macro GEN_BR_THUNK name,reg,tmp + .section .text.\name,"axG",@progbits,\name,comdat + .globl \name + .hidden \name + .type \name,@function +\name: + .cfi_startproc +#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES + exrl 0,0f +#else + larl \tmp,0f + ex 0,0(\tmp) +#endif + j . +0: br \reg + .cfi_endproc + .endm + + GEN_BR_THUNK __s390x_indirect_jump_r1use_r9,%r9,%r1 + GEN_BR_THUNK __s390x_indirect_jump_r1use_r14,%r14,%r1 + GEN_BR_THUNK __s390x_indirect_jump_r11use_r14,%r14,%r11 + + .macro BASR_R14_R9 +0: brasl %r14,__s390x_indirect_jump_r1use_r9 + .pushsection .s390_indirect_branches,"a",@progbits + .long 0b-. + .popsection + .endm + + .macro BR_R1USE_R14 +0: jg __s390x_indirect_jump_r1use_r14 + .pushsection .s390_indirect_branches,"a",@progbits + .long 0b-. + .popsection + .endm + + .macro BR_R11USE_R14 +0: jg __s390x_indirect_jump_r11use_r14 + .pushsection .s390_indirect_branches,"a",@progbits + .long 0b-. + .popsection + .endm + +#else /* CONFIG_EXPOLINE */ + + .macro BASR_R14_R9 + basr %r14,%r9 + .endm + + .macro BR_R1USE_R14 + br %r14 + .endm + + .macro BR_R11USE_R14 + br %r14 + .endm + +#endif /* CONFIG_EXPOLINE */ + + .section .kprobes.text, "ax" .Ldummy: /* @@ -236,7 +298,7 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCAL ENTRY(__bpon) .globl __bpon BPON - br %r14 + BR_R1USE_R14
/* * Scheduler resume function, called by switch_to @@ -261,9 +323,9 @@ ENTRY(__switch_to) mvc __LC_CURRENT_PID(4,%r0),__TASK_pid(%r3) # store pid of next lmg %r6,%r15,__SF_GPRS(%r15) # load gprs of next task TSTMSK __LC_MACHINE_FLAGS,MACHINE_FLAG_LPP - bzr %r14 + jz 0f .insn s,0xb2800000,__LC_LPP # set program parameter - br %r14 +0: BR_R1USE_R14
.L__critical_start:
@@ -330,7 +392,7 @@ sie_exit: xgr %r5,%r5 lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers lg %r2,__SF_EMPTY+16(%r15) # return exit reason code - br %r14 + BR_R1USE_R14 .Lsie_fault: lghi %r14,-EFAULT stg %r14,__SF_EMPTY+16(%r15) # set exit reason code @@ -389,7 +451,7 @@ ENTRY(system_call) lgf %r9,0(%r8,%r10) # get system call add. TSTMSK __TI_flags(%r12),_TIF_TRACE jnz .Lsysc_tracesys - basr %r14,%r9 # call sys_xxxx + BASR_R14_R9 # call sys_xxxx stg %r2,__PT_R2(%r11) # store return value
.Lsysc_return: @@ -566,7 +628,7 @@ ENTRY(system_call) lmg %r3,%r7,__PT_R3(%r11) stg %r7,STACK_FRAME_OVERHEAD(%r15) lg %r2,__PT_ORIG_GPR2(%r11) - basr %r14,%r9 # call sys_xxx + BASR_R14_R9 # call sys_xxx stg %r2,__PT_R2(%r11) # store return value .Lsysc_tracenogo: TSTMSK __TI_flags(%r12),_TIF_TRACE @@ -590,7 +652,7 @@ ENTRY(ret_from_fork) lmg %r9,%r10,__PT_R9(%r11) # load gprs ENTRY(kernel_thread_starter) la %r2,0(%r10) - basr %r14,%r9 + BASR_R14_R9 j .Lsysc_tracenogo
/* @@ -667,9 +729,9 @@ ENTRY(pgm_check_handler) nill %r10,0x007f sll %r10,2 je .Lpgm_return - lgf %r1,0(%r10,%r1) # load address of handler routine + lgf %r9,0(%r10,%r1) # load address of handler routine lgr %r2,%r11 # pass pointer to pt_regs - basr %r14,%r1 # branch to interrupt-handler + BASR_R14_R9 # branch to interrupt-handler .Lpgm_return: LOCKDEP_SYS_EXIT tm __PT_PSW+1(%r11),0x01 # returning to user ? @@ -979,7 +1041,7 @@ ENTRY(psw_idle) stpt __TIMER_IDLE_ENTER(%r2) .Lpsw_idle_lpsw: lpswe __SF_EMPTY(%r15) - br %r14 + BR_R1USE_R14 .Lpsw_idle_end:
/* @@ -993,7 +1055,7 @@ ENTRY(save_fpu_regs) lg %r2,__LC_CURRENT aghi %r2,__TASK_thread TSTMSK __LC_CPU_FLAGS,_CIF_FPU - bor %r14 + jo .Lsave_fpu_regs_exit stfpc __THREAD_FPU_fpc(%r2) lg %r3,__THREAD_FPU_regs(%r2) TSTMSK __LC_MACHINE_FLAGS,MACHINE_FLAG_VX @@ -1020,7 +1082,8 @@ ENTRY(save_fpu_regs) std 15,120(%r3) .Lsave_fpu_regs_done: oi __LC_CPU_FLAGS+7,_CIF_FPU - br %r14 +.Lsave_fpu_regs_exit: + BR_R1USE_R14 .Lsave_fpu_regs_end: EXPORT_SYMBOL(save_fpu_regs)
@@ -1038,7 +1101,7 @@ load_fpu_regs: lg %r4,__LC_CURRENT aghi %r4,__TASK_thread TSTMSK __LC_CPU_FLAGS,_CIF_FPU - bnor %r14 + jno .Lload_fpu_regs_exit lfpc __THREAD_FPU_fpc(%r4) TSTMSK __LC_MACHINE_FLAGS,MACHINE_FLAG_VX lg %r4,__THREAD_FPU_regs(%r4) # %r4 <- reg save area @@ -1065,7 +1128,8 @@ load_fpu_regs: ld 15,120(%r4) .Lload_fpu_regs_done: ni __LC_CPU_FLAGS+7,255-_CIF_FPU - br %r14 +.Lload_fpu_regs_exit: + BR_R1USE_R14 .Lload_fpu_regs_end:
.L__critical_end: @@ -1237,7 +1301,7 @@ cleanup_critical: jl 0f clg %r9,BASED(.Lcleanup_table+104) # .Lload_fpu_regs_end jl .Lcleanup_load_fpu_regs -0: br %r14 +0: BR_R11USE_R14
.align 8 .Lcleanup_table: @@ -1273,7 +1337,7 @@ cleanup_critical: ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce larl %r9,sie_exit # skip forward to sie_exit - br %r14 + BR_R11USE_R14 #endif
.Lcleanup_system_call: @@ -1326,7 +1390,7 @@ cleanup_critical: stg %r15,56(%r11) # r15 stack pointer # set new psw address and exit larl %r9,.Lsysc_do_svc - br %r14 + BR_R11USE_R14 .Lcleanup_system_call_insn: .quad system_call .quad .Lsysc_stmg @@ -1338,7 +1402,7 @@ cleanup_critical:
.Lcleanup_sysc_tif: larl %r9,.Lsysc_tif - br %r14 + BR_R11USE_R14
.Lcleanup_sysc_restore: # check if stpt has been executed @@ -1355,14 +1419,14 @@ cleanup_critical: mvc 0(64,%r11),__PT_R8(%r9) lmg %r0,%r7,__PT_R0(%r9) 1: lmg %r8,%r9,__LC_RETURN_PSW - br %r14 + BR_R11USE_R14 .Lcleanup_sysc_restore_insn: .quad .Lsysc_exit_timer .quad .Lsysc_done - 4
.Lcleanup_io_tif: larl %r9,.Lio_tif - br %r14 + BR_R11USE_R14
.Lcleanup_io_restore: # check if stpt has been executed @@ -1376,7 +1440,7 @@ cleanup_critical: mvc 0(64,%r11),__PT_R8(%r9) lmg %r0,%r7,__PT_R0(%r9) 1: lmg %r8,%r9,__LC_RETURN_PSW - br %r14 + BR_R11USE_R14 .Lcleanup_io_restore_insn: .quad .Lio_exit_timer .quad .Lio_done - 4 @@ -1429,17 +1493,17 @@ cleanup_critical: # prepare return psw nihh %r8,0xfcfd # clear irq & wait state bits lg %r9,48(%r11) # return from psw_idle - br %r14 + BR_R11USE_R14 .Lcleanup_idle_insn: .quad .Lpsw_idle_lpsw
.Lcleanup_save_fpu_regs: larl %r9,save_fpu_regs - br %r14 + BR_R11USE_R14
.Lcleanup_load_fpu_regs: larl %r9,load_fpu_regs - br %r14 + BR_R11USE_R14
/* * Integer constants @@ -1459,7 +1523,6 @@ cleanup_critical: .Lsie_crit_mcck_length: .quad .Lsie_skip - .Lsie_entry #endif - .section .rodata, "a" #define SYSCALL(esame,emu) .long esame .globl sys_call_table --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -32,6 +32,8 @@ #include <linux/moduleloader.h> #include <linux/bug.h> #include <asm/alternative.h> +#include <asm/nospec-branch.h> +#include <asm/facility.h>
#if 0 #define DEBUGP printk @@ -169,7 +171,11 @@ int module_frob_arch_sections(Elf_Ehdr * me->arch.got_offset = me->core_layout.size; me->core_layout.size += me->arch.got_size; me->arch.plt_offset = me->core_layout.size; - me->core_layout.size += me->arch.plt_size; + if (me->arch.plt_size) { + if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_call_disable) + me->arch.plt_size += PLT_ENTRY_SIZE; + me->core_layout.size += me->arch.plt_size; + } return 0; }
@@ -323,9 +329,21 @@ static int apply_rela(Elf_Rela *rela, El unsigned int *ip; ip = me->core_layout.base + me->arch.plt_offset + info->plt_offset; - ip[0] = 0x0d10e310; /* basr 1,0; lg 1,10(1); br 1 */ - ip[1] = 0x100a0004; - ip[2] = 0x07f10000; + ip[0] = 0x0d10e310; /* basr 1,0 */ + ip[1] = 0x100a0004; /* lg 1,10(1) */ + if (IS_ENABLED(CONFIG_EXPOLINE) && + !nospec_call_disable) { + unsigned int *ij; + ij = me->core_layout.base + + me->arch.plt_offset + + me->arch.plt_size - PLT_ENTRY_SIZE; + ip[2] = 0xa7f40000 + /* j __jump_r1 */ + (unsigned int)(u16) + (((unsigned long) ij - 8 - + (unsigned long) ip) / 2); + } else { + ip[2] = 0x07f10000; /* br %r1 */ + } ip[3] = (unsigned int) (val >> 32); ip[4] = (unsigned int) val; info->plt_initialized = 1; @@ -431,16 +449,42 @@ int module_finalize(const Elf_Ehdr *hdr, struct module *me) { const Elf_Shdr *s; - char *secstrings; + char *secstrings, *secname; + void *aseg; + + if (IS_ENABLED(CONFIG_EXPOLINE) && + !nospec_call_disable && me->arch.plt_size) { + unsigned int *ij; + + ij = me->core_layout.base + me->arch.plt_offset + + me->arch.plt_size - PLT_ENTRY_SIZE; + if (test_facility(35)) { + ij[0] = 0xc6000000; /* exrl %r0,.+10 */ + ij[1] = 0x0005a7f4; /* j . */ + ij[2] = 0x000007f1; /* br %r1 */ + } else { + ij[0] = 0x44000000 | (unsigned int) + offsetof(struct lowcore, br_r1_trampoline); + ij[1] = 0xa7f40000; /* j . */ + } + }
secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { - if (!strcmp(".altinstructions", secstrings + s->sh_name)) { - /* patch .altinstructions */ - void *aseg = (void *)s->sh_addr; + aseg = (void *) s->sh_addr; + secname = secstrings + s->sh_name;
+ if (!strcmp(".altinstructions", secname)) + /* patch .altinstructions */ apply_alternatives(aseg, aseg + s->sh_size); - } + + if (IS_ENABLED(CONFIG_EXPOLINE) && + (!strcmp(".nospec_call_table", secname))) + nospec_call_revert(aseg, aseg + s->sh_size); + + if (IS_ENABLED(CONFIG_EXPOLINE) && + (!strcmp(".nospec_return_table", secname))) + nospec_return_revert(aseg, aseg + s->sh_size); }
jump_label_apply_nops(me); --- /dev/null +++ b/arch/s390/kernel/nospec-branch.c @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/module.h> +#include <asm/facility.h> +#include <asm/nospec-branch.h> + +int nospec_call_disable = IS_ENABLED(EXPOLINE_OFF); +int nospec_return_disable = !IS_ENABLED(EXPOLINE_FULL); + +static int __init nospectre_v2_setup_early(char *str) +{ + nospec_call_disable = 1; + nospec_return_disable = 1; + return 0; +} +early_param("nospectre_v2", nospectre_v2_setup_early); + +static int __init spectre_v2_setup_early(char *str) +{ + if (str && !strncmp(str, "on", 2)) { + nospec_call_disable = 0; + nospec_return_disable = 0; + } + if (str && !strncmp(str, "off", 3)) { + nospec_call_disable = 1; + nospec_return_disable = 1; + } + if (str && !strncmp(str, "auto", 4)) { + nospec_call_disable = 0; + nospec_return_disable = 1; + } + return 0; +} +early_param("spectre_v2", spectre_v2_setup_early); + +static void __init_or_module __nospec_revert(s32 *start, s32 *end) +{ + enum { BRCL_EXPOLINE, BRASL_EXPOLINE } type; + u8 *instr, *thunk, *br; + u8 insnbuf[6]; + s32 *epo; + + /* Second part of the instruction replace is always a nop */ + memcpy(insnbuf + 2, (char[]) { 0x47, 0x00, 0x00, 0x00 }, 4); + for (epo = start; epo < end; epo++) { + instr = (u8 *) epo + *epo; + if (instr[0] == 0xc0 && (instr[1] & 0x0f) == 0x04) + type = BRCL_EXPOLINE; /* brcl instruction */ + else if (instr[0] == 0xc0 && (instr[1] & 0x0f) == 0x05) + type = BRASL_EXPOLINE; /* brasl instruction */ + else + continue; + thunk = instr + (*(int *)(instr + 2)) * 2; + if (thunk[0] == 0xc6 && thunk[1] == 0x00) + /* exrl %r0,<target-br> */ + br = thunk + (*(int *)(thunk + 2)) * 2; + else if (thunk[0] == 0xc0 && (thunk[1] & 0x0f) == 0x00 && + thunk[6] == 0x44 && thunk[7] == 0x00 && + (thunk[8] & 0x0f) == 0x00 && thunk[9] == 0x00 && + (thunk[1] & 0xf0) == (thunk[8] & 0xf0)) + /* larl %rx,<target br> + ex %r0,0(%rx) */ + br = thunk + (*(int *)(thunk + 2)) * 2; + else + continue; + if (br[0] != 0x07 || (br[1] & 0xf0) != 0xf0) + continue; + switch (type) { + case BRCL_EXPOLINE: + /* brcl to thunk, replace with br + nop */ + insnbuf[0] = br[0]; + insnbuf[1] = (instr[1] & 0xf0) | (br[1] & 0x0f); + break; + case BRASL_EXPOLINE: + /* brasl to thunk, replace with basr + nop */ + insnbuf[0] = 0x0d; + insnbuf[1] = (instr[1] & 0xf0) | (br[1] & 0x0f); + break; + } + + s390_kernel_write(instr, insnbuf, 6); + } +} + +void __init_or_module nospec_call_revert(s32 *start, s32 *end) +{ + if (nospec_call_disable) + __nospec_revert(start, end); +} + +void __init_or_module nospec_return_revert(s32 *start, s32 *end) +{ + if (nospec_return_disable) + __nospec_revert(start, end); +} + +extern s32 __nospec_call_start[], __nospec_call_end[]; +extern s32 __nospec_return_start[], __nospec_return_end[]; +void __init nospec_init_branches(void) +{ + nospec_call_revert(__nospec_call_start, __nospec_call_end); + nospec_return_revert(__nospec_return_start, __nospec_return_end); +} --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -67,6 +67,7 @@ #include <asm/sysinfo.h> #include <asm/numa.h> #include <asm/alternative.h> +#include <asm/nospec-branch.h> #include "entry.h"
/* @@ -384,6 +385,7 @@ static void __init setup_lowcore(void) #ifdef CONFIG_SMP lc->spinlock_lockval = arch_spin_lockval(0); #endif + lc->br_r1_trampoline = 0x07f1; /* br %r1 */
set_prefix((u32)(unsigned long) lc); lowcore_ptr[0] = lc; @@ -959,6 +961,8 @@ void __init setup_arch(char **cmdline_p) set_preferred_console();
apply_alternative_instructions(); + if (IS_ENABLED(CONFIG_EXPOLINE)) + nospec_init_branches();
/* Setup zfcpdump support */ setup_zfcpdump(); --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -228,6 +228,7 @@ static int pcpu_alloc_lowcore(struct pcp lc->mcesad = mcesa_origin | mcesa_bits; lc->cpu_nr = cpu; lc->spinlock_lockval = arch_spin_lockval(cpu); + lc->br_r1_trampoline = 0x07f1; /* br %r1 */ if (vdso_alloc_per_cpu(lc)) goto out; lowcore_ptr[cpu] = lc; --- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -128,6 +128,20 @@ SECTIONS *(.altinstr_replacement) }
+ /* + * Table with the patch locations to undo expolines + */ + .nospec_call_table : { + __nospec_call_start = . ; + *(.s390_indirect*) + __nospec_call_end = . ; + } + .nospec_return_table : { + __nospec_return_start = . ; + *(.s390_return*) + __nospec_return_end = . ; + } + /* early.c uses stsi, which requires page aligned data. */ . = ALIGN(PAGE_SIZE); INIT_DATA_SECTION(0x100) --- a/drivers/s390/char/Makefile +++ b/drivers/s390/char/Makefile @@ -17,6 +17,8 @@ CFLAGS_REMOVE_sclp_early_core.o += $(CC_ CFLAGS_sclp_early_core.o += -march=z900 endif
+CFLAGS_REMOVE_sclp_early_core.o += $(CC_FLAGS_EXPOLINE) + obj-y += ctrlchar.o keyboard.o defkeymap.o sclp.o sclp_rw.o sclp_quiesce.o \ sclp_cmd.o sclp_config.o sclp_cpi_sys.o sclp_ocf.o sclp_ctl.o \ sclp_early.o sclp_early_core.o
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Christian Borntraeger borntraeger@de.ibm.com
[ Upstream commit f315104ad8b0c32be13eac628569ae707c332cb5 ]
If the guest runs with bp isolation when doing a SIE instruction, we must also run the nested guest with bp isolation when emulating that SIE instruction. This is done by activating BPBC in the lpar, which acts as an override for lower level guests.
Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Reviewed-by: Janosch Frank frankja@linux.vnet.ibm.com Reviewed-by: David Hildenbrand david@redhat.com Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kvm/vsie.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -831,6 +831,7 @@ static int do_vsie_run(struct kvm_vcpu * { struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s; struct kvm_s390_sie_block *scb_o = vsie_page->scb_o; + int guest_bp_isolation; int rc;
handle_last_fault(vcpu, vsie_page); @@ -841,6 +842,20 @@ static int do_vsie_run(struct kvm_vcpu * s390_handle_mcck();
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); + + /* save current guest state of bp isolation override */ + guest_bp_isolation = test_thread_flag(TIF_ISOLATE_BP_GUEST); + + /* + * The guest is running with BPBC, so we have to force it on for our + * nested guest. This is done by enabling BPBC globally, so the BPBC + * control in the SCB (which the nested guest can modify) is simply + * ignored. + */ + if (test_kvm_facility(vcpu->kvm, 82) && + vcpu->arch.sie_block->fpf & FPF_BPBC) + set_thread_flag(TIF_ISOLATE_BP_GUEST); + local_irq_disable(); guest_enter_irqoff(); local_irq_enable(); @@ -850,6 +865,11 @@ static int do_vsie_run(struct kvm_vcpu * local_irq_disable(); guest_exit_irqoff(); local_irq_enable(); + + /* restore guest state for bp isolation override */ + if (!guest_bp_isolation) + clear_thread_flag(TIF_ISOLATE_BP_GUEST); + vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
if (rc == -EINTR) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Eugeniu Rosca erosca@de.adit-jv.com
[ Upstream commit 2cb370d615e9fbed9e95ed222c2c8f337181aa90 ]
I've accidentally stumbled upon the IS_ENABLED(EXPOLINE_*) lines, which obviously always evaluate to false. Fix this.
Fixes: f19fbd5ed642 ("s390: introduce execute-trampolines for branches") Signed-off-by: Eugeniu Rosca erosca@de.adit-jv.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/nospec-branch.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -3,8 +3,8 @@ #include <asm/facility.h> #include <asm/nospec-branch.h>
-int nospec_call_disable = IS_ENABLED(EXPOLINE_OFF); -int nospec_return_disable = !IS_ENABLED(EXPOLINE_FULL); +int nospec_call_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF); +int nospec_return_disable = !IS_ENABLED(CONFIG_EXPOLINE_FULL);
static int __init nospectre_v2_setup_early(char *str) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit d5feec04fe578c8dbd9e2e1439afc2f0af761ed4 ]
The system call path can be interrupted before the switch back to the standard branch prediction with BPENTER has been done. The critical section cleanup code skips forward to .Lsysc_do_svc and bypasses the BPENTER. In this case the kernel and all subsequent code will run with the limited branch prediction.
Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/entry.S | 1 + 1 file changed, 1 insertion(+)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -1375,6 +1375,7 @@ cleanup_critical: stg %r15,__LC_SYSTEM_TIMER 0: # update accounting time stamp mvc __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP # set up saved register r11 lg %r15,__LC_KERNEL_STACK la %r9,STACK_FRAME_OVERHEAD(%r15)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Christian Borntraeger borntraeger@de.ibm.com
[ Upstream commit d3f468963cd6fd6d2aa5e26aed8b24232096d0e1 ]
when a system call is interrupted we might call the critical section cleanup handler that re-does some of the operations. When we are between .Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the problem state registers r0-r7:
.Lcleanup_system_call: [...] 0: # update accounting time stamp mvc __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER # set up saved register r11 lg %r15,__LC_KERNEL_STACK la %r9,STACK_FRAME_OVERHEAD(%r15) stg %r9,24(%r11) # r11 pt_regs pointer # fill pt_regs mvc __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC ---> stmg %r0,%r7,__PT_R0(%r9)
The problem is now, that we might have already zeroed out r0. The fix is to move the zeroing of r0 after sysc_do_svc.
Reported-by: Farhan Ali alifm@linux.vnet.ibm.com Fixes: 7041d28115e91 ("s390: scrub registers on kernel entry and KVM exit") Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/entry.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -426,13 +426,13 @@ ENTRY(system_call) UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP stmg %r0,%r7,__PT_R0(%r11) - # clear user controlled register to prevent speculative use - xgr %r0,%r0 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC stg %r14,__PT_FLAGS(%r11) .Lsysc_do_svc: + # clear user controlled register to prevent speculative use + xgr %r0,%r0 # load address of system call table lg %r10,__THREAD_sysc_table(%r13,%r12) llgh %r8,__PT_INT_CODE+2(%r11)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit b2e2f43a01bace1a25bdbae04c9f9846882b727a ]
Keep the code for the nobp parameter handling with the code for expolines. Both are related to the spectre v2 mitigation.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/Makefile | 4 ++-- arch/s390/kernel/alternative.c | 23 ----------------------- arch/s390/kernel/nospec-branch.c | 27 +++++++++++++++++++++++++++ 3 files changed, 29 insertions(+), 25 deletions(-)
--- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -59,11 +59,11 @@ obj-y += debug.o irq.o ipl.o dis.o diag. obj-y += sysinfo.o jump_label.o lgr.o os_info.o machine_kexec.o pgm_check.o obj-y += runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o obj-y += entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o +obj-y += nospec-branch.o
extra-y += head.o head64.o vmlinux.lds
-obj-$(CONFIG_EXPOLINE) += nospec-branch.o -CFLAGS_REMOVE_expoline.o += $(CC_FLAGS_EXPOLINE) +CFLAGS_REMOVE_nospec-branch.o += $(CC_FLAGS_EXPOLINE)
obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_SMP) += smp.o --- a/arch/s390/kernel/alternative.c +++ b/arch/s390/kernel/alternative.c @@ -14,29 +14,6 @@ static int __init disable_alternative_in
early_param("noaltinstr", disable_alternative_instructions);
-static int __init nobp_setup_early(char *str) -{ - bool enabled; - int rc; - - rc = kstrtobool(str, &enabled); - if (rc) - return rc; - if (enabled && test_facility(82)) - __set_facility(82, S390_lowcore.alt_stfle_fac_list); - else - __clear_facility(82, S390_lowcore.alt_stfle_fac_list); - return 0; -} -early_param("nobp", nobp_setup_early); - -static int __init nospec_setup_early(char *str) -{ - __clear_facility(82, S390_lowcore.alt_stfle_fac_list); - return 0; -} -early_param("nospec", nospec_setup_early); - struct brcl_insn { u16 opc; s32 disp; --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -3,6 +3,31 @@ #include <asm/facility.h> #include <asm/nospec-branch.h>
+static int __init nobp_setup_early(char *str) +{ + bool enabled; + int rc; + + rc = kstrtobool(str, &enabled); + if (rc) + return rc; + if (enabled && test_facility(82)) + __set_facility(82, S390_lowcore.alt_stfle_fac_list); + else + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + return 0; +} +early_param("nobp", nobp_setup_early); + +static int __init nospec_setup_early(char *str) +{ + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + return 0; +} +early_param("nospec", nospec_setup_early); + +#ifdef CONFIG_EXPOLINE + int nospec_call_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF); int nospec_return_disable = !IS_ENABLED(CONFIG_EXPOLINE_FULL);
@@ -99,3 +124,5 @@ void __init nospec_init_branches(void) nospec_call_revert(__nospec_call_start, __nospec_call_end); nospec_return_revert(__nospec_return_start, __nospec_return_end); } + +#endif /* CONFIG_EXPOLINE */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6e179d64126b909f0b288fa63cdbf07c531e9b1d ]
Automatically decide between nobp vs. expolines if the spectre_v2=auto kernel parameter is specified or CONFIG_EXPOLINE_AUTO=y is set.
The decision made at boot time due to CONFIG_EXPOLINE_AUTO=y being set can be overruled with the nobp, nospec and spectre_v2 kernel parameters.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 2 - arch/s390/Makefile | 2 - arch/s390/include/asm/nospec-branch.h | 6 +-- arch/s390/kernel/alternative.c | 1 arch/s390/kernel/module.c | 11 ++--- arch/s390/kernel/nospec-branch.c | 68 +++++++++++++++++++++------------- 6 files changed, 52 insertions(+), 38 deletions(-)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -575,7 +575,7 @@ choice config EXPOLINE_OFF bool "spectre_v2=off"
-config EXPOLINE_MEDIUM +config EXPOLINE_AUTO bool "spectre_v2=auto"
config EXPOLINE_FULL --- a/arch/s390/Makefile +++ b/arch/s390/Makefile @@ -87,7 +87,7 @@ ifdef CONFIG_EXPOLINE CC_FLAGS_EXPOLINE += -mfunction-return=thunk CC_FLAGS_EXPOLINE += -mindirect-branch-table export CC_FLAGS_EXPOLINE - cflags-y += $(CC_FLAGS_EXPOLINE) + cflags-y += $(CC_FLAGS_EXPOLINE) -DCC_USING_EXPOLINE endif endif
--- a/arch/s390/include/asm/nospec-branch.h +++ b/arch/s390/include/asm/nospec-branch.h @@ -6,12 +6,10 @@
#include <linux/types.h>
-extern int nospec_call_disable; -extern int nospec_return_disable; +extern int nospec_disable;
void nospec_init_branches(void); -void nospec_call_revert(s32 *start, s32 *end); -void nospec_return_revert(s32 *start, s32 *end); +void nospec_revert(s32 *start, s32 *end);
#endif /* __ASSEMBLY__ */
--- a/arch/s390/kernel/alternative.c +++ b/arch/s390/kernel/alternative.c @@ -1,6 +1,7 @@ #include <linux/module.h> #include <asm/alternative.h> #include <asm/facility.h> +#include <asm/nospec-branch.h>
#define MAX_PATCH_LEN (255 - 1)
--- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -172,7 +172,7 @@ int module_frob_arch_sections(Elf_Ehdr * me->core_layout.size += me->arch.got_size; me->arch.plt_offset = me->core_layout.size; if (me->arch.plt_size) { - if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_call_disable) + if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_disable) me->arch.plt_size += PLT_ENTRY_SIZE; me->core_layout.size += me->arch.plt_size; } @@ -331,8 +331,7 @@ static int apply_rela(Elf_Rela *rela, El info->plt_offset; ip[0] = 0x0d10e310; /* basr 1,0 */ ip[1] = 0x100a0004; /* lg 1,10(1) */ - if (IS_ENABLED(CONFIG_EXPOLINE) && - !nospec_call_disable) { + if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_disable) { unsigned int *ij; ij = me->core_layout.base + me->arch.plt_offset + @@ -453,7 +452,7 @@ int module_finalize(const Elf_Ehdr *hdr, void *aseg;
if (IS_ENABLED(CONFIG_EXPOLINE) && - !nospec_call_disable && me->arch.plt_size) { + !nospec_disable && me->arch.plt_size) { unsigned int *ij;
ij = me->core_layout.base + me->arch.plt_offset + @@ -480,11 +479,11 @@ int module_finalize(const Elf_Ehdr *hdr,
if (IS_ENABLED(CONFIG_EXPOLINE) && (!strcmp(".nospec_call_table", secname))) - nospec_call_revert(aseg, aseg + s->sh_size); + nospec_revert(aseg, aseg + s->sh_size);
if (IS_ENABLED(CONFIG_EXPOLINE) && (!strcmp(".nospec_return_table", secname))) - nospec_return_revert(aseg, aseg + s->sh_size); + nospec_revert(aseg, aseg + s->sh_size); }
jump_label_apply_nops(me); --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -11,10 +11,17 @@ static int __init nobp_setup_early(char rc = kstrtobool(str, &enabled); if (rc) return rc; - if (enabled && test_facility(82)) + if (enabled && test_facility(82)) { + /* + * The user explicitely requested nobp=1, enable it and + * disable the expoline support. + */ __set_facility(82, S390_lowcore.alt_stfle_fac_list); - else + if (IS_ENABLED(CONFIG_EXPOLINE)) + nospec_disable = 1; + } else { __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + } return 0; } early_param("nobp", nobp_setup_early); @@ -28,31 +35,46 @@ early_param("nospec", nospec_setup_early
#ifdef CONFIG_EXPOLINE
-int nospec_call_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF); -int nospec_return_disable = !IS_ENABLED(CONFIG_EXPOLINE_FULL); +int nospec_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF);
static int __init nospectre_v2_setup_early(char *str) { - nospec_call_disable = 1; - nospec_return_disable = 1; + nospec_disable = 1; return 0; } early_param("nospectre_v2", nospectre_v2_setup_early);
+static int __init spectre_v2_auto_early(void) +{ + if (IS_ENABLED(CC_USING_EXPOLINE)) { + /* + * The kernel has been compiled with expolines. + * Keep expolines enabled and disable nobp. + */ + nospec_disable = 0; + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + } + /* + * If the kernel has not been compiled with expolines the + * nobp setting decides what is done, this depends on the + * CONFIG_KERNEL_NP option and the nobp/nospec parameters. + */ + return 0; +} +#ifdef CONFIG_EXPOLINE_AUTO +early_initcall(spectre_v2_auto_early); +#endif + static int __init spectre_v2_setup_early(char *str) { if (str && !strncmp(str, "on", 2)) { - nospec_call_disable = 0; - nospec_return_disable = 0; - } - if (str && !strncmp(str, "off", 3)) { - nospec_call_disable = 1; - nospec_return_disable = 1; - } - if (str && !strncmp(str, "auto", 4)) { - nospec_call_disable = 0; - nospec_return_disable = 1; + nospec_disable = 0; + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); } + if (str && !strncmp(str, "off", 3)) + nospec_disable = 1; + if (str && !strncmp(str, "auto", 4)) + spectre_v2_auto_early(); return 0; } early_param("spectre_v2", spectre_v2_setup_early); @@ -105,15 +127,9 @@ static void __init_or_module __nospec_re } }
-void __init_or_module nospec_call_revert(s32 *start, s32 *end) -{ - if (nospec_call_disable) - __nospec_revert(start, end); -} - -void __init_or_module nospec_return_revert(s32 *start, s32 *end) +void __init_or_module nospec_revert(s32 *start, s32 *end) { - if (nospec_return_disable) + if (nospec_disable) __nospec_revert(start, end); }
@@ -121,8 +137,8 @@ extern s32 __nospec_call_start[], __nosp extern s32 __nospec_return_start[], __nospec_return_end[]; void __init nospec_init_branches(void) { - nospec_call_revert(__nospec_call_start, __nospec_call_end); - nospec_return_revert(__nospec_return_start, __nospec_return_end); + nospec_revert(__nospec_call_start, __nospec_call_end); + nospec_revert(__nospec_return_start, __nospec_return_end); }
#endif /* CONFIG_EXPOLINE */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit bc035599718412cfba9249aa713f90ef13f13ee9 ]
Add a boot message if either of the spectre defenses is active. The message is "Spectre V2 mitigation: execute trampolines." or "Spectre V2 mitigation: limited branch prediction."
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/nospec-branch.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -33,6 +33,16 @@ static int __init nospec_setup_early(cha } early_param("nospec", nospec_setup_early);
+static int __init nospec_report(void) +{ + if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) + pr_info("Spectre V2 mitigation: execute trampolines.\n"); + if (__test_facility(82, S390_lowcore.alt_stfle_fac_list)) + pr_info("Spectre V2 mitigation: limited branch prediction.\n"); + return 0; +} +arch_initcall(nospec_report); + #ifdef CONFIG_EXPOLINE
int nospec_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit d424986f1d6b16079b3231db0314923f4f8deed1 ]
Set CONFIG_GENERIC_CPU_VULNERABILITIES and provide the two functions cpu_show_spectre_v1 and cpu_show_spectre_v2 to report the spectre mitigations.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 1 + arch/s390/kernel/nospec-branch.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -121,6 +121,7 @@ config S390 select GENERIC_CLOCKEVENTS select GENERIC_CPU_AUTOPROBE select GENERIC_CPU_DEVICES if !SMP + select GENERIC_CPU_VULNERABILITIES select GENERIC_FIND_FIRST_BIT select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/module.h> +#include <linux/device.h> #include <asm/facility.h> #include <asm/nospec-branch.h>
@@ -43,6 +44,24 @@ static int __init nospec_report(void) } arch_initcall(nospec_report);
+#ifdef CONFIG_SYSFS +ssize_t cpu_show_spectre_v1(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Mitigation: __user pointer sanitization\n"); +} + +ssize_t cpu_show_spectre_v2(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) + return sprintf(buf, "Mitigation: execute trampolines\n"); + if (__test_facility(82, S390_lowcore.alt_stfle_fac_list)) + return sprintf(buf, "Mitigation: limited branch prediction.\n"); + return sprintf(buf, "Vulnerable\n"); +} +#endif + #ifdef CONFIG_EXPOLINE
int nospec_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6a3d1e81a434fc311f224b8be77258bafc18ccc6 ]
With CONFIG_EXPOLINE_AUTO=y the call of spectre_v2_auto_early() via early_initcall is done *after* the early_param functions. This overwrites any settings done with the nobp/no_spectre_v2/spectre_v2 parameters. The code patching for the kernel is done after the evaluation of the early parameters but before the early_initcall is done. The end result is a kernel image that is patched correctly but the kernel modules are not.
Make sure that the nospec auto detection function is called before the early parameters are evaluated and before the code patching is done.
Fixes: 6e179d64126b ("s390: add automatic detection of the spectre defense") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/nospec-branch.h | 1 + arch/s390/kernel/nospec-branch.c | 8 ++------ arch/s390/kernel/setup.c | 3 +++ 3 files changed, 6 insertions(+), 6 deletions(-)
--- a/arch/s390/include/asm/nospec-branch.h +++ b/arch/s390/include/asm/nospec-branch.h @@ -9,6 +9,7 @@ extern int nospec_disable;
void nospec_init_branches(void); +void nospec_auto_detect(void); void nospec_revert(s32 *start, s32 *end);
#endif /* __ASSEMBLY__ */ --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -73,7 +73,7 @@ static int __init nospectre_v2_setup_ear } early_param("nospectre_v2", nospectre_v2_setup_early);
-static int __init spectre_v2_auto_early(void) +void __init nospec_auto_detect(void) { if (IS_ENABLED(CC_USING_EXPOLINE)) { /* @@ -88,11 +88,7 @@ static int __init spectre_v2_auto_early( * nobp setting decides what is done, this depends on the * CONFIG_KERNEL_NP option and the nobp/nospec parameters. */ - return 0; } -#ifdef CONFIG_EXPOLINE_AUTO -early_initcall(spectre_v2_auto_early); -#endif
static int __init spectre_v2_setup_early(char *str) { @@ -103,7 +99,7 @@ static int __init spectre_v2_setup_early if (str && !strncmp(str, "off", 3)) nospec_disable = 1; if (str && !strncmp(str, "auto", 4)) - spectre_v2_auto_early(); + nospec_auto_detect(); return 0; } early_param("spectre_v2", spectre_v2_setup_early); --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -897,6 +897,9 @@ void __init setup_arch(char **cmdline_p) init_mm.end_data = (unsigned long) &_edata; init_mm.brk = (unsigned long) &_end;
+ if (IS_ENABLED(CONFIG_EXPOLINE_AUTO)) + nospec_auto_detect(); + parse_early_param(); #ifdef CONFIG_CRASH_DUMP /* Deactivate elfcorehdr= kernel parameter */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6cf09958f32b9667bb3ebadf74367c791112771b ]
The main linker script vmlinux.lds.S for the kernel image merges the expoline code patch tables into two section ".nospec_call_table" and ".nospec_return_table". This is *not* done for the modules, there the sections retain their original names as generated by gcc: ".s390_indirect_call", ".s390_return_mem" and ".s390_return_reg".
The module_finalize code has to check for the compiler generated section names, otherwise no code patching is done. This slows down the module code in case of "spectre_v2=off".
Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/module.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -478,11 +478,11 @@ int module_finalize(const Elf_Ehdr *hdr, apply_alternatives(aseg, aseg + s->sh_size);
if (IS_ENABLED(CONFIG_EXPOLINE) && - (!strcmp(".nospec_call_table", secname))) + (!strncmp(".s390_indirect", secname, 14))) nospec_revert(aseg, aseg + s->sh_size);
if (IS_ENABLED(CONFIG_EXPOLINE) && - (!strcmp(".nospec_return_table", secname))) + (!strncmp(".s390_return", secname, 12))) nospec_revert(aseg, aseg + s->sh_size); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Simek michal.simek@xilinx.com
commit 18ffc0cce4ff947a2acc9b2e06ae5309a6e6fb43 upstream.
The patch: "microblaze: Setup proper dependency for optimized lib functions" (sha1: 7b6ce52be3f86520524711a6f33f3866f9339694) didn't setup all dependencies properly. Optimized lib functions in C are also present for little endian and optimized library functions in assembler are implemented only for big endian version.
Reported-by: kbuild test robot fengguang.wu@intel.com Signed-off-by: Michal Simek michal.simek@xilinx.com Cc: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/microblaze/Kconfig.platform | 1 + arch/microblaze/lib/fastcopy.S | 4 ---- 2 files changed, 1 insertion(+), 4 deletions(-)
--- a/arch/microblaze/Kconfig.platform +++ b/arch/microblaze/Kconfig.platform @@ -20,6 +20,7 @@ config OPT_LIB_FUNCTION config OPT_LIB_ASM bool "Optimalized lib function ASM" depends on OPT_LIB_FUNCTION && (XILINX_MICROBLAZE0_USE_BARREL = 1) + depends on CPU_BIG_ENDIAN default n help Allows turn on optimalized library function (memcpy and memmove). --- a/arch/microblaze/lib/fastcopy.S +++ b/arch/microblaze/lib/fastcopy.S @@ -29,10 +29,6 @@ * between mem locations with size of xfer spec'd in bytes */
-#ifdef __MICROBLAZEEL__ -#error Microblaze LE not support ASM optimized lib func. Disable OPT_LIB_ASM. -#endif - #include <linux/linkage.h> .text .globl memcpy
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Klaus Goger klaus.goger@theobroma-systems.com
commit 87eba0716011e528f7841026f2cc65683219d0ad upstream.
vdd_log has no consumer and therefore will not be set to a specific voltage. Still the PWM output pin gets configured and thence the vdd_log output voltage will changed from it's default. Depending on the idle state of the PWM this will slightly over or undervoltage the logic supply of the RK3399 and cause instability with GbE (undervoltage) and PCIe (overvoltage). Since the default value set by a voltage divider is the correct supply voltage and we don't need to change it during runtime we remove the rail from the devicetree completely so the PWM pin will not be configured.
Signed-off-by: Klaus Goger klaus.goger@theobroma-systems.com Signed-off-by: Heiko Stuebner heiko@sntech.de Cc: Jakob Unterwurzacher jakob.unterwurzacher@theobroma-systems.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi | 11 ----------- 1 file changed, 11 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi @@ -155,17 +155,6 @@ regulator-min-microvolt = <5000000>; regulator-max-microvolt = <5000000>; }; - - vdd_log: vdd-log { - compatible = "pwm-regulator"; - pwms = <&pwm2 0 25000 0>; - regulator-name = "vdd_log"; - regulator-min-microvolt = <800000>; - regulator-max-microvolt = <1400000>; - regulator-always-on; - regulator-boot-on; - status = "okay"; - }; };
&cpu_b0 {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 1f5781725dcbb026438e77091c91a94f678c3522 upstream.
syzbot is reporting NULL pointer dereference at xattr_getsecurity() [1], for cap_inode_getsecurity() is returning sizeof(struct vfs_cap_data) when memory allocation failed. Return -ENOMEM if memory allocation failed.
[1] https://syzkaller.appspot.com/bug?id=a55ba438506fe68649a5f50d2d82d56b365e010...
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Fixes: 8db6c34f1dbc8e06 ("Introduce v3 namespaced file capabilities") Reported-by: syzbot syzbot+9369930ca44f29e60e2d@syzkaller.appspotmail.com Cc: stable stable@vger.kernel.org # 4.14+ Acked-by: Serge E. Hallyn serge@hallyn.com Acked-by: James Morris james.morris@microsoft.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/commoncap.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/security/commoncap.c +++ b/security/commoncap.c @@ -449,6 +449,8 @@ int cap_inode_getsecurity(struct inode * magic |= VFS_CAP_FLAGS_EFFECTIVE; memcpy(&cap->data, &nscap->data, sizeof(__le32) * 2 * VFS_CAP_U32); cap->magic_etc = cpu_to_le32(magic); + } else { + size = -ENOMEM; } } kfree(tmpbuf);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin K. Petersen martin.petersen@oracle.com
commit 94e5395d2403c8bc2504a7cbe4c4caaacb7b8b84 upstream.
First generation MPT Fusion controllers can not translate WRITE SAME when the attached device is a SATA drive. Disable WRITE SAME support.
Reported-by: Nikola Ciprich nikola.ciprich@linuxbox.cz Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/message/fusion/mptsas.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1995,6 +1995,7 @@ static struct scsi_host_template mptsas_ .cmd_per_lun = 7, .use_clustering = ENABLE_CLUSTERING, .shost_attrs = mptscsih_host_attrs, + .no_write_same = 1, };
static int mptsas_get_linkerrors(struct sas_phy *phy)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@oracle.com
commit 9de4ee40547fd315d4a0ed1dd15a2fa3559ad707 upstream.
This cast is wrong. "cdi->capacity" is an int and "arg" is an unsigned long. The way the check is written now, if one of the high 32 bits is set then we could read outside the info->slots[] array.
This bug is pretty old and it predates git.
Reviewed-by: Christoph Hellwig hch@lst.de Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/cdrom/cdrom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -2374,7 +2374,7 @@ static int cdrom_ioctl_media_changed(str if (!CDROM_CAN(CDC_SELECT_DISC) || arg == CDSL_CURRENT) return media_changed(cdi, 1);
- if ((unsigned int)arg >= cdi->capacity) + if (arg >= cdi->capacity) return -EINVAL;
info = kmalloc(sizeof(*info), GFP_KERNEL);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Robert Kolchmeyer rkolchmeyer@google.com
commit d90a10e2444ba5a351fa695917258ff4c5709fa5 upstream.
fsnotify() acquires a reference to a fsnotify_mark_connector through the SRCU-protected pointer to_tell->i_fsnotify_marks. However, it appears that no precautions are taken in fsnotify_put_mark() to ensure that fsnotify() drops its reference to this fsnotify_mark_connector before assigning a value to its 'destroy_next' field. This can result in fsnotify_put_mark() assigning a value to a connector's 'destroy_next' field right before fsnotify() tries to traverse the linked list referenced by the connector's 'list' field. Since these two fields are members of the same union, this behavior results in a kernel panic.
This issue is resolved by moving the connector's 'destroy_next' field into the object pointer union. This should work since the object pointer access is protected by both a spinlock and the value of the 'flags' field, and the 'flags' field is cleared while holding the spinlock in fsnotify_put_mark() before 'destroy_next' is updated. It shouldn't be possible for another thread to accidentally read from the object pointer after the 'destroy_next' field is updated.
The offending behavior here is extremely unlikely; since fsnotify_put_mark() removes references to a connector (specifically, it ensures that the connector is unreachable from the inode it was formerly attached to) before updating its 'destroy_next' field, a sizeable chunk of code in fsnotify_put_mark() has to execute in the short window between when fsnotify() acquires the connector reference and saves the value of its 'list' field. On the HEAD kernel, I've only been able to reproduce this by inserting a udelay(1) in fsnotify(). However, I've been able to reproduce this issue without inserting a udelay(1) anywhere on older unmodified release kernels, so I believe it's worth fixing at HEAD.
References: https://bugzilla.kernel.org/show_bug.cgi?id=199437 Fixes: 08991e83b7286635167bab40927665a90fb00d81 CC: stable@vger.kernel.org Signed-off-by: Robert Kolchmeyer rkolchmeyer@google.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/fsnotify_backend.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -217,12 +217,10 @@ struct fsnotify_mark_connector { union { /* Object pointer [lock] */ struct inode *inode; struct vfsmount *mnt; - }; - union { - struct hlist_head list; /* Used listing heads to free after srcu period expires */ struct fsnotify_mark_connector *destroy_next; }; + struct hlist_head list; };
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Finn Thain fthain@telegraphics.com.au
commit b64576cbf36afa5fabf3b31f62a1994c429ef855 upstream.
For reasons I don't understand, calling ioremap() then iounmap() on the SWIM MMIO region causes a hang on 68030 (but not on 68040).
~# modprobe swim_mod SWIM floppy driver Version 0.2 (2008-10-30) SWIM device not found ! watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:285] Modules linked in: swim_mod(+) Format 00 Vector: 0064 PC: 000075aa Status: 2000 Not tainted ORIG_D0: ffffffff D0: d00c0000 A2: 007c2370 A1: 003f810c A0: 00040000 D5: d0096800 D4: d0097e00 D3: 00000001 D2: 00000003 D1: 00000000 Non-Maskable Interrupt Modules linked in: swim_mod(+) PC: [<000075ba>] __iounmap+0x24/0x10e SR: 2000 SP: 007abc48 a2: 007c2370 d0: d00c0000 d1: 000001a0 d2: 00000019 d3: 00000001 d4: d0097e00 d5: d0096800 a0: 00040000 a1: 003f810c Process modprobe (pid: 285, task=007c2370) Frame format=0 Stack from 007abc7c: ffffffed 00000000 006a4060 004712e0 007abca0 000076ea d0080000 00080000 010bb4b8 007abcd8 010ba542 d0096000 00000000 00000000 00000001 010bb59c 00000000 007abf30 010bb4b8 0047760a 0047763c 00477612 00616540 007abcec 0020a91a 00477600 0047760a 010bb4cc 007abd18 002092f2 0047760a 00333b06 007abd5c 00000000 0047760a 010bb4cc 00404f90 004776b8 00000001 007abd38 00209446 010bb4cc 0047760a 010bb4cc 0020938e 0031f8be 00616540 007abd64 Call Trace: [<000076ea>] iounmap+0x46/0x5a [<00080000>] shrink_page_list+0x7f6/0xe06 [<010ba542>] swim_probe+0xe4/0x496 [swim_mod] [<0020a91a>] platform_drv_probe+0x20/0x5e [<002092f2>] driver_probe_device+0x21c/0x2b8 [<00333b06>] mutex_lock+0x0/0x2e [<00209446>] __driver_attach+0xb8/0xce [<0020938e>] __driver_attach+0x0/0xce [<0031f8be>] klist_next+0x0/0xa0 [<00207562>] bus_for_each_dev+0x74/0xba [<000344c0>] blocking_notifier_call_chain+0x0/0x20 [<00333b06>] mutex_lock+0x0/0x2e [<00208e44>] driver_attach+0x1a/0x1e [<0020938e>] __driver_attach+0x0/0xce [<00207e26>] bus_add_driver+0x188/0x234 [<000344c0>] blocking_notifier_call_chain+0x0/0x20 [<00209894>] driver_register+0x58/0x104 [<000344c0>] blocking_notifier_call_chain+0x0/0x20 [<010bd000>] swim_init+0x0/0x2c [swim_mod] [<0020a7be>] __platform_driver_register+0x38/0x3c [<010bd028>] swim_init+0x28/0x2c [swim_mod] [<000020dc>] do_one_initcall+0x38/0x196 [<000344c0>] blocking_notifier_call_chain+0x0/0x20 [<003331cc>] mutex_unlock+0x0/0x3e [<00333b06>] mutex_lock+0x0/0x2e [<003331cc>] mutex_unlock+0x0/0x3e [<00333b06>] mutex_lock+0x0/0x2e [<003331cc>] mutex_unlock+0x0/0x3e [<00333b06>] mutex_lock+0x0/0x2e [<003331cc>] mutex_unlock+0x0/0x3e [<00333b06>] mutex_lock+0x0/0x2e [<00075008>] __free_pages+0x0/0x38 [<000045c0>] mangle_kernel_stack+0x30/0xda [<000344c0>] blocking_notifier_call_chain+0x0/0x20 [<003331cc>] mutex_unlock+0x0/0x3e [<00333b06>] mutex_lock+0x0/0x2e [<0005ced4>] do_init_module+0x42/0x266 [<010bd000>] swim_init+0x0/0x2c [swim_mod] [<000344c0>] blocking_notifier_call_chain+0x0/0x20 [<0005eda0>] load_module+0x1a30/0x1e70 [<0000465d>] mangle_kernel_stack+0xcd/0xda [<00331c64>] __generic_copy_from_user+0x0/0x46 [<0033256e>] _cond_resched+0x0/0x32 [<00331b9c>] memset+0x0/0x98 [<0033256e>] _cond_resched+0x0/0x32 [<0005f25c>] SyS_init_module+0x7c/0x112 [<00002000>] _start+0x0/0x8 [<00002000>] _start+0x0/0x8 [<00331c82>] __generic_copy_from_user+0x1e/0x46 [<0005f2b2>] SyS_init_module+0xd2/0x112 [<0000465d>] mangle_kernel_stack+0xcd/0xda [<00002b40>] syscall+0x8/0xc [<0000465d>] mangle_kernel_stack+0xcd/0xda [<0008c00c>] pcpu_balance_workfn+0xb2/0x40e Code: 2200 7419 e4a9 e589 2841 d9fc 0000 1000 <2414> 7203 c282 7602 b681 6600 0096 0242 fe00 0482 0000 0000 e9c0 11c3 ed89 2642
There's no need to call ioremap() for the SWIM address range, as it lies within the usual IO device region at 0x5000 0000, which has already been mapped by head.S.
Remove the redundant ioremap() and iounmap() calls to fix the hang.
Cc: Laurent Vivier lvivier@redhat.com Cc: stable@vger.kernel.org # v4.14+ Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Acked-by: Laurent Vivier lvivier@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/swim.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
--- a/drivers/block/swim.c +++ b/drivers/block/swim.c @@ -911,7 +911,7 @@ static int swim_probe(struct platform_de goto out; }
- swim_base = ioremap(res->start, resource_size(res)); + swim_base = (struct swim __iomem *)res->start; if (!swim_base) { ret = -ENOMEM; goto out_release_io; @@ -923,7 +923,7 @@ static int swim_probe(struct platform_de if (!get_swim_mode(swim_base)) { printk(KERN_INFO "SWIM device not found !\n"); ret = -ENODEV; - goto out_iounmap; + goto out_release_io; }
/* set platform driver data */ @@ -931,7 +931,7 @@ static int swim_probe(struct platform_de swd = kzalloc(sizeof(struct swim_priv), GFP_KERNEL); if (!swd) { ret = -ENOMEM; - goto out_iounmap; + goto out_release_io; } platform_set_drvdata(dev, swd);
@@ -945,8 +945,6 @@ static int swim_probe(struct platform_de
out_kfree: kfree(swd); -out_iounmap: - iounmap(swim_base); out_release_io: release_mem_region(res->start, resource_size(res)); out: @@ -974,8 +972,6 @@ static int swim_remove(struct platform_d for (drive = 0; drive < swd->floppy_count; drive++) floppy_eject(&swd->unit[drive]);
- iounmap(swd->base); - res = platform_get_resource(dev, IORESOURCE_MEM, 0); if (res) release_mem_region(res->start, resource_size(res));
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Finn Thain fthain@telegraphics.com.au
commit 8a500df63d07d8aee44b7ee2c54e462e47ce93ec upstream.
The SWIM chip is compatible with GCR-mode Sony 400K/800K drives but this driver only supports MFM mode. Therefore only Sony FDHD drives are supported. Skip incompatible drives.
Cc: Laurent Vivier lvivier@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org # v4.14+ Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Acked-by: Laurent Vivier lvivier@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/swim.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/block/swim.c +++ b/drivers/block/swim.c @@ -834,10 +834,12 @@ static int swim_floppy_init(struct swim_ /* scan floppy drives */
swim_drive(base, INTERNAL_DRIVE); - if (swim_readbit(base, DRIVE_PRESENT)) + if (swim_readbit(base, DRIVE_PRESENT) && + !swim_readbit(base, ONEMEG_DRIVE)) swim_add_floppy(swd, INTERNAL_DRIVE); swim_drive(base, EXTERNAL_DRIVE); - if (swim_readbit(base, DRIVE_PRESENT)) + if (swim_readbit(base, DRIVE_PRESENT) && + !swim_readbit(base, ONEMEG_DRIVE)) swim_add_floppy(swd, EXTERNAL_DRIVE);
/* register floppy drives */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Finn Thain fthain@telegraphics.com.au
commit 8e2ab5a4efaac77fb93e5b5b109d0b3976fdd3a0 upstream.
The 'eject' shell command may send various different ioctl commands. This leads to error messages on the console even though the FDEJECT ioctl succeeds.
~# eject floppy SWIM floppy_ioctl: unknown cmd 21257 SWIM floppy_ioctl: unknown cmd 1
Don't log an error message for an invalid ioctl, just do as the swim3 driver does and return -ENOTTY.
Cc: Laurent Vivier lvivier@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org # v4.14+ Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Acked-by: Laurent Vivier lvivier@redhat.com Reviewed-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/swim.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
--- a/drivers/block/swim.c +++ b/drivers/block/swim.c @@ -727,14 +727,9 @@ static int floppy_ioctl(struct block_dev if (copy_to_user((void __user *) param, (void *) &floppy_type, sizeof(struct floppy_struct))) return -EFAULT; - break; - - default: - printk(KERN_DEBUG "SWIM floppy_ioctl: unknown cmd %d\n", - cmd); - return -ENOSYS; + return 0; } - return 0; + return -ENOTTY; }
static int floppy_getgeo(struct block_device *bdev, struct hd_geometry *geo)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Finn Thain fthain@telegraphics.com.au
commit c1d6207cc0eef2a7f8551f9c7420d8776268f6e1 upstream.
Cc: Laurent Vivier lvivier@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org # v4.14+ Fixes: 103db8b2dfa5 ("[PATCH] swim: stop sharing request queue across multiple gendisks") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Acked-by: Laurent Vivier lvivier@redhat.com Reviewed-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/swim.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/block/swim.c +++ b/drivers/block/swim.c @@ -858,7 +858,6 @@ static int swim_floppy_init(struct swim_ &swd->lock); if (!swd->unit[drive].disk->queue) { err = -ENOMEM; - put_disk(swd->unit[drive].disk); goto exit_put_disks; } blk_queue_bounce_limit(swd->unit[drive].disk->queue,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Finn Thain fthain@telegraphics.com.au
commit 56a1c5ee54f69dd767fb61d301883dc919ddc259 upstream.
The Sony drive status bits use active-low logic. The swim_readbit() function converts that to 'C' logic for readability. Hence, the sense of the names of the status bit macros should not be inverted.
Mostly they are correct. However, the TWOMEG_DRIVE, MFM_MODE and TWOMEG_MEDIA macros have inverted sense (like MkLinux). Fix this inconsistency and make the following patches less confusing.
The same problem affects swim3.c so fix that too.
No functional change.
The FDHD drive status bits are documented in sonydriv.cpp from MAME and in swimiii.h from MkLinux.
Cc: Laurent Vivier lvivier@redhat.com Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org # v4.14+ Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Acked-by: Laurent Vivier lvivier@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/swim.c | 8 ++++---- drivers/block/swim3.c | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-)
--- a/drivers/block/swim.c +++ b/drivers/block/swim.c @@ -110,7 +110,7 @@ struct iwm { /* Select values for swim_select and swim_readbit */
#define READ_DATA_0 0x074 -#define TWOMEG_DRIVE 0x075 +#define ONEMEG_DRIVE 0x075 #define SINGLE_SIDED 0x076 #define DRIVE_PRESENT 0x077 #define DISK_IN 0x170 @@ -118,9 +118,9 @@ struct iwm { #define TRACK_ZERO 0x172 #define TACHO 0x173 #define READ_DATA_1 0x174 -#define MFM_MODE 0x175 +#define GCR_MODE 0x175 #define SEEK_COMPLETE 0x176 -#define ONEMEG_MEDIA 0x177 +#define TWOMEG_MEDIA 0x177
/* Bits in handshake register */
@@ -612,7 +612,7 @@ static void setup_medium(struct floppy_s struct floppy_struct *g; fs->disk_in = 1; fs->write_protected = swim_readbit(base, WRITE_PROT); - fs->type = swim_readbit(base, ONEMEG_MEDIA); + fs->type = swim_readbit(base, TWOMEG_MEDIA);
if (swim_track00(base)) printk(KERN_ERR --- a/drivers/block/swim3.c +++ b/drivers/block/swim3.c @@ -148,7 +148,7 @@ struct swim3 { #define MOTOR_ON 2 #define RELAX 3 /* also eject in progress */ #define READ_DATA_0 4 -#define TWOMEG_DRIVE 5 +#define ONEMEG_DRIVE 5 #define SINGLE_SIDED 6 /* drive or diskette is 4MB type? */ #define DRIVE_PRESENT 7 #define DISK_IN 8 @@ -156,9 +156,9 @@ struct swim3 { #define TRACK_ZERO 10 #define TACHO 11 #define READ_DATA_1 12 -#define MFM_MODE 13 +#define GCR_MODE 13 #define SEEK_COMPLETE 14 -#define ONEMEG_MEDIA 15 +#define TWOMEG_MEDIA 15
/* Definitions of values used in writing and formatting */ #define DATA_ESCAPE 0x99
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Finn Thain fthain@telegraphics.com.au
commit b3906535ccc6cd04c42f9b1c7e31d1947b3ebc74 upstream.
The driver supports internal and external FDD units so the floppy_open function must not hard-code the drive location.
Cc: Laurent Vivier lvivier@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org # v4.14+ Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Acked-by: Laurent Vivier lvivier@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/swim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/block/swim.c +++ b/drivers/block/swim.c @@ -646,7 +646,7 @@ static int floppy_open(struct block_devi
swim_write(base, setup, S_IBM_DRIVE | S_FCLK_DIV2); udelay(10); - swim_drive(base, INTERNAL_DRIVE); + swim_drive(base, fs->location); swim_motor(base, ON); swim_action(base, SETMFM); if (fs->ejected)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Finn Thain fthain@telegraphics.com.au
commit 7ae6a2b6cc058005ee3d0d2b9ce27688e51afa4b upstream.
In the floppy_find() function in swim.c is a call to get_disk(swd->unit[drive].disk). The actual parameter to this call can be a NULL pointer when drive == swd->floppy_count. This causes an oops in get_disk().
Data read fault at 0x00000198 in Super Data (pc=0x1be5b6) BAD KERNEL BUSERR Oops: 00000000 Modules linked in: swim_mod ipv6 mac8390 PC: [<001be5b6>] get_disk+0xc/0x76 SR: 2004 SP: 9a078bc1 a2: 0213ed90 d0: 00000000 d1: 00000000 d2: 00000000 d3: 000000ff d4: 00000002 d5: 02983590 a0: 02332e00 a1: 022dfd64 Process dd (pid: 285, task=020ab25b) Frame format=B ssw=074d isc=4a88 isb=6732 daddr=00000198 dobuf=00000000 baddr=001be5bc dibuf=bfffffff ver=f Stack from 022dfca4: 00000000 0203fc00 0213ed90 022dfcc0 02982936 00000000 00200000 022dfd08 0020f85a 00200000 022dfd64 02332e00 004040fc 00000014 001be77e 022dfd64 00334e4a 001be3f8 0800001d 022dfd64 01c04b60 01c04b70 022aba80 029828f8 02332e00 022dfd2c 001be7ac 0203fc00 00200000 022dfd64 02103a00 01c04b60 01c04b60 0200e400 022dfd68 000e191a 00200000 022dfd64 02103a00 0800001d 00000000 00000003 000b89de 00500000 02103a00 01c04b60 02103a08 01c04c2e Call Trace: [<02982936>] floppy_find+0x3e/0x4a [swim_mod] [<00200000>] uart_remove_one_port+0x1a2/0x260 [<0020f85a>] kobj_lookup+0xde/0x132 [<00200000>] uart_remove_one_port+0x1a2/0x260 [<001be77e>] get_gendisk+0x0/0x130 [<00334e4a>] mutex_lock+0x0/0x2e [<001be3f8>] disk_block_events+0x0/0x6c [<029828f8>] floppy_find+0x0/0x4a [swim_mod] [<001be7ac>] get_gendisk+0x2e/0x130 [<00200000>] uart_remove_one_port+0x1a2/0x260 [<000e191a>] __blkdev_get+0x32/0x45a [<00200000>] uart_remove_one_port+0x1a2/0x260 [<000b89de>] complete_walk+0x0/0x8a [<000e1e22>] blkdev_get+0xe0/0x29a [<000e1fdc>] blkdev_open+0x0/0xb0 [<000b89de>] complete_walk+0x0/0x8a [<000e1fdc>] blkdev_open+0x0/0xb0 [<000e01cc>] bd_acquire+0x74/0x8a [<000e205c>] blkdev_open+0x80/0xb0 [<000e1fdc>] blkdev_open+0x0/0xb0 [<000abf24>] do_dentry_open+0x1a4/0x322 [<00020000>] __do_proc_douintvec+0x22/0x27e [<000b89de>] complete_walk+0x0/0x8a [<000baa62>] link_path_walk+0x0/0x48e [<000ba3f8>] inode_permission+0x20/0x54 [<000ac0e4>] vfs_open+0x42/0x78 [<000bc372>] path_openat+0x2b2/0xeaa [<000bc0c0>] path_openat+0x0/0xeaa [<0004463e>] __irq_wake_thread+0x0/0x4e [<0003a45a>] task_tick_fair+0x18/0xc8 [<000bd00a>] do_filp_open+0xa0/0xea [<000abae0>] do_sys_open+0x11a/0x1ee [<00020000>] __do_proc_douintvec+0x22/0x27e [<000abbf4>] SyS_open+0x1e/0x22 [<00020000>] __do_proc_douintvec+0x22/0x27e [<00002b40>] syscall+0x8/0xc [<00020000>] __do_proc_douintvec+0x22/0x27e [<0000c00b>] dyadic+0x1/0x28 Code: 4e5e 4e75 4e56 fffc 2f0b 2f02 266e 0008 <206b> 0198 4a88 6732 2428 002c 661e 486b 0058 4eb9 0032 0b96 588f 4a88 672c 2008 Disabling lock debugging due to kernel taint
Fix the array index bounds check to avoid this.
Cc: Laurent Vivier lvivier@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org # v4.14+ Fixes: 8852ecd97488 ("[PATCH] m68k: mac - Add SWIM floppy support") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Acked-by: Laurent Vivier lvivier@redhat.com Reviewed-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/swim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/block/swim.c +++ b/drivers/block/swim.c @@ -790,7 +790,7 @@ static struct kobject *floppy_find(dev_t struct swim_priv *swd = data; int drive = (*part & 3);
- if (drive > swd->floppy_count) + if (drive >= swd->floppy_count) return NULL;
*part = 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Finn Thain fthain@telegraphics.com.au
commit 5a13388d7aa1177b98d7168330ecbeeac52f844d upstream.
Reading to the end of a 720K disk results in an IO error instead of EOF because the block layer thinks the disk has 2880 sectors. (Partly this is a result of inverted logic of the ONEMEG_MEDIA bit that's now fixed.)
Initialize the density and head count in swim_add_floppy() to agree with the device size passed to set_capacity() during drive probe.
Call set_capacity() again upon device open, after refreshing the density and head count values.
Cc: Laurent Vivier lvivier@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org # v4.14+ Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Acked-by: Laurent Vivier lvivier@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/swim.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/block/swim.c +++ b/drivers/block/swim.c @@ -612,7 +612,6 @@ static void setup_medium(struct floppy_s struct floppy_struct *g; fs->disk_in = 1; fs->write_protected = swim_readbit(base, WRITE_PROT); - fs->type = swim_readbit(base, TWOMEG_MEDIA);
if (swim_track00(base)) printk(KERN_ERR @@ -620,6 +619,9 @@ static void setup_medium(struct floppy_s
swim_track00(base);
+ fs->type = swim_readbit(base, TWOMEG_MEDIA) ? + HD_MEDIA : DD_MEDIA; + fs->head_number = swim_readbit(base, SINGLE_SIDED) ? 1 : 2; get_floppy_geometry(fs, 0, &g); fs->total_secs = g->size; fs->secpercyl = g->head * g->sect; @@ -656,6 +658,8 @@ static int floppy_open(struct block_devi goto out; }
+ set_capacity(fs->disk, fs->total_secs); + if (mode & FMODE_NDELAY) return 0;
@@ -808,10 +812,9 @@ static int swim_add_floppy(struct swim_p
swim_motor(base, OFF);
- if (swim_readbit(base, SINGLE_SIDED)) - fs->head_number = 1; - else - fs->head_number = 2; + fs->type = HD_MEDIA; + fs->head_number = 2; + fs->ref_count = 0; fs->ejected = 1;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Xu peterx@redhat.com
commit 9a0fd675304d410f3a9586e1b333e16f4658d56c upstream.
It's been missing for a while but no one is touching that up. Fix it.
Link: http://lkml.kernel.org/r/20180315060639.9578-1-peterx@redhat.com
CC: Ingo Molnar mingo@kernel.org Cc:stable@vger.kernel.org Fixes: 7b2c86250122d ("tracing: Add NMI tracing in hwlat detector") Signed-off-by: Peter Xu peterx@redhat.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace_entries.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -356,7 +356,7 @@ FTRACE_ENTRY(hwlat, hwlat_entry, __field( unsigned int, seqnum ) ),
- F_printk("cnt:%u\tts:%010llu.%010lu\tinner:%llu\touter:%llunmi-ts:%llu\tnmi-count:%u\n", + F_printk("cnt:%u\tts:%010llu.%010lu\tinner:%llu\touter:%llu\tnmi-ts:%llu\tnmi-count:%u\n", __entry->seqnum, __entry->tv_sec, __entry->tv_nsec,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Ott sebott@linux.ibm.com
commit af2e460ade0b0180d0f3812ca4f4f59cc9597f3e upstream.
Channel path descriptors have been seen as something stable (as long as the chpid is configured). Recent tests have shown that the descriptor can also be altered when the link state of a channel path changes. Thus it is necessary to update the descriptor during handling of resource accessibility events.
Cc: stable@vger.kernel.org Signed-off-by: Sebastian Ott sebott@linux.ibm.com Reviewed-by: Peter Oberparleiter oberpar@linux.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/chsc.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/s390/cio/chsc.c +++ b/drivers/s390/cio/chsc.c @@ -451,6 +451,7 @@ static void chsc_process_sei_link_incide
static void chsc_process_sei_res_acc(struct chsc_sei_nt0_area *sei_area) { + struct channel_path *chp; struct chp_link link; struct chp_id chpid; int status; @@ -463,10 +464,17 @@ static void chsc_process_sei_res_acc(str chpid.id = sei_area->rsid; /* allocate a new channel path structure, if needed */ status = chp_get_status(chpid); - if (status < 0) - chp_new(chpid); - else if (!status) + if (!status) return; + + if (status < 0) { + chp_new(chpid); + } else { + chp = chpid_to_chp(chpid); + mutex_lock(&chp->lock); + chp_update_desc(chp); + mutex_unlock(&chp->lock); + } memset(&link, 0, sizeof(struct chp_link)); link.chpid = chpid; if ((sei_area->vf & 0xc0) != 0) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Haberland sth@linux.vnet.ibm.com
commit 5d27a2bf6e14f5c7d1033ad1e993fcd0eba43e83 upstream.
When a new CKD storage volume is defined at the storage server, Linux may be relying on outdated information about that volume, which leads to the following errors:
1. Command Reject Errors for minidisk on z/VM:
dasd-eckd.b3193d: 0.0.XXXX: An error occurred in the DASD device driver, reason=09 dasd(eckd): I/O status report for device 0.0.XXXX: dasd(eckd): in req: 00000000XXXXXXXX CC:00 FC:04 AC:00 SC:17 DS:02 CS:00 RC:0 dasd(eckd): device 0.0.2046: Failing CCW: 00000000XXXXXXXX dasd(eckd): Sense(hex) 0- 7: 80 00 00 00 00 00 00 00 dasd(eckd): Sense(hex) 8-15: 00 00 00 00 00 00 00 00 dasd(eckd): Sense(hex) 16-23: 00 00 00 00 e1 00 0f 00 dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 00 00 00 00 dasd(eckd): 24 Byte: 0 MSG 0, no MSGb to SYSOP
2. Equipment Check errors on LPAR or for dedicated devices on z/VM:
dasd(eckd): I/O status report for device 0.0.XXXX: dasd(eckd): in req: 00000000XXXXXXXX CC:00 FC:04 AC:00 SC:17 DS:0E CS:40 fcxs:01 schxs:00 RC:0 dasd(eckd): device 0.0.9713: Failing TCW: 00000000XXXXXXXX dasd(eckd): Sense(hex) 0- 7: 10 00 00 00 13 58 4d 0f dasd(eckd): Sense(hex) 8-15: 67 00 00 00 00 00 00 04 dasd(eckd): Sense(hex) 16-23: e5 18 05 33 97 01 0f 0f dasd(eckd): Sense(hex) 24-31: 00 00 40 e2 00 04 58 0d dasd(eckd): 24 Byte: 0 MSG f, no MSGb to SYSOP
Fix this problem by using the up-to-date information provided during online processing via the device specific SNEQ to detect the case of outdated LCU data. If there is a difference, perform a re-read of that data.
Cc: stable@vger.kernel.org Reviewed-by: Jan Hoeppner hoeppner@linux.ibm.com Signed-off-by: Stefan Haberland sth@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/block/dasd_alias.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/drivers/s390/block/dasd_alias.c +++ b/drivers/s390/block/dasd_alias.c @@ -592,13 +592,22 @@ static int _schedule_lcu_update(struct a int dasd_alias_add_device(struct dasd_device *device) { struct dasd_eckd_private *private = device->private; - struct alias_lcu *lcu; + __u8 uaddr = private->uid.real_unit_addr; + struct alias_lcu *lcu = private->lcu; unsigned long flags; int rc;
- lcu = private->lcu; rc = 0; spin_lock_irqsave(&lcu->lock, flags); + /* + * Check if device and lcu type differ. If so, the uac data may be + * outdated and needs to be updated. + */ + if (private->uid.type != lcu->uac->unit[uaddr].ua_type) { + lcu->flags |= UPDATE_PENDING; + DBF_DEV_EVENT(DBF_WARNING, device, "%s", + "uid type mismatch - trigger rescan"); + } if (!(lcu->flags & UPDATE_PENDING)) { rc = _add_device_to_lcu(lcu, device, device); if (rc)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiko Carstens heiko.carstens@de.ibm.com
commit 783c3b53b9506db3e05daacfe34e0287eebb09d8 upstream.
Implement s390 specific arch_uretprobe_is_alive() to avoid SIGSEGVs observed with uretprobes in combination with setjmp/longjmp.
See commit 2dea1d9c38e4 ("powerpc/uprobes: Implement arch_uretprobe_is_alive()") for more details.
With this implemented all test cases referenced in the above commit pass.
Reported-by: Ziqian SUN zsun@redhat.com Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/uprobes.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/arch/s390/kernel/uprobes.c +++ b/arch/s390/kernel/uprobes.c @@ -150,6 +150,15 @@ unsigned long arch_uretprobe_hijack_retu return orig; }
+bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx, + struct pt_regs *regs) +{ + if (ctx == RP_CHECK_CHAIN_CALL) + return user_stack_pointer(regs) <= ret->stack; + else + return user_stack_pointer(regs) < ret->stack; +} + /* Instruction Emulation */
static void adjust_psw_addr(psw_t *psw, unsigned long len)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 53fa1f6e8a5958da698a31edf366ffe90596b490 upstream.
Commit 5928c281524f (ACPI / video: Default lcd_only to true on Win8-ready and newer machines) made only_lcd default to true on all machines where acpi_osi_is_win8() returns true, including laptops.
The purpose of this is to avoid the bogus / non-working acpi backlight interface which many newer BIOS-es define on desktop machines.
But this is causing a regression on some laptops, specifically on the Dell XPS 13 2013 model, which does not have the LCD flag set for its fully functional ACPI backlight interface.
Rather then DMI quirking our way out of this, this commits changes the logic for setting only_lcd to true, to only do this on machines with a desktop (or server) dmi chassis-type.
Note that we cannot simply only check the chassis-type and not register the backlight interface based on that as there are some laptops and tablets which have their chassis-type set to "3" aka desktop. Hopefully the combination of checking the LCD flag, but only on devices with a desktop(ish) chassis-type will avoid the needs for DMI quirks for this, or at least limit the amount of DMI quirks which we need to a minimum.
Fixes: 5928c281524f (ACPI / video: Default lcd_only to true on Win8-ready and newer machines) Reported-and-tested-by: James Hogan jhogan@kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Cc: 4.15+ stable@vger.kernel.org # 4.15+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpi_video.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-)
--- a/drivers/acpi/acpi_video.c +++ b/drivers/acpi/acpi_video.c @@ -2123,6 +2123,25 @@ static int __init intel_opregion_present return opregion; }
+static bool dmi_is_desktop(void) +{ + const char *chassis_type; + + chassis_type = dmi_get_system_info(DMI_CHASSIS_TYPE); + if (!chassis_type) + return false; + + if (!strcmp(chassis_type, "3") || /* 3: Desktop */ + !strcmp(chassis_type, "4") || /* 4: Low Profile Desktop */ + !strcmp(chassis_type, "5") || /* 5: Pizza Box */ + !strcmp(chassis_type, "6") || /* 6: Mini Tower */ + !strcmp(chassis_type, "7") || /* 7: Tower */ + !strcmp(chassis_type, "11")) /* 11: Main Server Chassis */ + return true; + + return false; +} + int acpi_video_register(void) { int ret = 0; @@ -2143,8 +2162,12 @@ int acpi_video_register(void) * win8 ready (where we also prefer the native backlight driver, so * normally the acpi_video code should not register there anyways). */ - if (only_lcd == -1) - only_lcd = acpi_osi_is_win8(); + if (only_lcd == -1) { + if (dmi_is_desktop() && acpi_osi_is_win8()) + only_lcd = true; + else + only_lcd = false; + }
dmi_check_system(video_dmi_table);
On Fri, Apr 27, 2018 at 03:57:53PM +0200, Greg Kroah-Hartman wrote:
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.14.38-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: a0fef4b4352d42e1d747f8c9dc2af45ea0b6b977 git describe: v4.14.37-81-ga0fef4b4352d Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.37-81...
No regressions (compared to build v4.14.37)
Boards, architectures and test suites: -------------------------------------
dragonboard-410c * boot - pass: 20, * kselftest - skip: 27, pass: 41, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 6, pass: 57, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - skip: 1, pass: 21, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 134, pass: 1016, * ltp-timers-tests - pass: 13,
hi6220-hikey - arm64 * boot - pass: 20, * kselftest - skip: 23, pass: 44, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 6, pass: 57, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - skip: 1, pass: 21, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 4, pass: 10, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 135, pass: 1015, * ltp-timers-tests - pass: 13,
juno-r2 - arm64 * boot - pass: 20, * kselftest - skip: 25, pass: 43, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 6, pass: 57, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 4, pass: 10, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 134, pass: 1016, * ltp-timers-tests - pass: 13,
qemu_arm * boot - pass: 10, fail: 5 * libhugetlbfs - pass: 1, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 62, fail: 2 * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 5, pass: 58, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-hugetlb-tests - skip: 1, pass: 21, * ltp-ipc-tests - pass: 9, * ltp-nptl-tests - pass: 2,
qemu_arm64 * boot - pass: 19, fail: 1 * kselftest - skip: 29, pass: 41, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 6, pass: 57, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 156, pass: 992, fail: 2 * ltp-timers-tests - pass: 13,
qemu_x86_64 * boot - pass: 22, * kselftest - skip: 30, pass: 50, * kselftest-vsyscall-mode-native - skip: 30, pass: 50, * kselftest-vsyscall-mode-none - skip: 30, pass: 50, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 6, pass: 57, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 1, pass: 13, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 153, pass: 997, * ltp-timers-tests - pass: 13,
x15 - arm * boot - pass: 17, * kselftest - skip: 26, pass: 37, fail: 2 * libhugetlbfs - skip: 1, pass: 87, * ltp-cap_bounds-tests - pass: 2, * ltp-fs-tests - skip: 5, pass: 58, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - skip: 2, pass: 20, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 1, pass: 13, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 75, pass: 1075, * ltp-timers-tests - pass: 13,
x86_64 * boot - pass: 22, * kselftest - skip: 22, pass: 54, * kselftest-vsyscall-mode-native - skip: 25, pass: 53, fail: 1 * kselftest-vsyscall-mode-none - skip: 22, pass: 54, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 5, pass: 58, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 5, pass: 9, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 116, pass: 1034, * ltp-timers-tests - pass: 13,
On 04/27/2018 06:57 AM, Greg Kroah-Hartman wrote:
For v4.14.37-80-g73477f7:
Build results: total: 146 pass: 146 fail: 0 Qemu test results: total: 141 pass: 141 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
linux-stable-mirror@lists.linaro.org