This is the start of the stable review cycle for the 4.4.191 release. There are 77 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 06 Sep 2019 05:50:23 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.191-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.4.191-rc1
Greg Kroah-Hartman gregkh@linuxfoundation.org x86/ptrace: fix up botched merge of spectrev1 fix
Johannes Berg johannes.berg@intel.com mac80211: fix possible sta leak
Hodaszi, Robert Robert.Hodaszi@digi.com Revert "cfg80211: fix processing world regdomain when non modular"
Nadav Amit namit@vmware.com VMCI: Release resource if the work is already queued
Ding Xiang dingxiang@cmss.chinamobile.com stm class: Fix a double free of stm_source_device
Ulf Hansson ulf.hansson@linaro.org mmc: core: Fix init of SD cards reporting an invalid VDD range
Eugen Hristev eugen.hristev@microchip.com mmc: sdhci-of-at91: add quirk for broken HS200
Sebastian Mayr me@sam.st uprobes/x86: Fix detection of 32-bit user mode
Ricardo Neri ricardo.neri-calderon@linux.intel.com ptrace,x86: Make user_64bit_mode() available to 32-bit builds
Kai-Heng Feng kai.heng.feng@canonical.com USB: storage: ums-realtek: Whitelist auto-delink support
Kai-Heng Feng kai.heng.feng@canonical.com USB: storage: ums-realtek: Update module parameter description for auto_delink_en
Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com usb: host: ohci: fix a race condition between shutdown and irq
Oliver Neukum oneukum@suse.com USB: cdc-wdm: fix race between write and disconnect due to flag abuse
Henk van der Laan opensource@henkvdlaan.com usb-storage: Add new JMS567 revision to unusual_devs
Bandan Das bsd@redhat.com x86/apic: Include the LDR when clearing out APIC registers
Bandan Das bsd@redhat.com x86/apic: Do not initialize LDR and DFR for bigsmp
Sean Christopherson sean.j.christopherson@intel.com KVM: x86: Don't update RIP or do single-step on faulting emulation
Takashi Iwai tiwai@suse.de ALSA: seq: Fix potential concurrent access to the deleted pool
Eric Dumazet edumazet@google.com tcp: make sure EPOLLOUT wont be missed
Hui Peng benquike@gmail.com ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit
Hui Peng benquike@gmail.com ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term
Tim Froidcoeur tim.froidcoeur@tessares.net tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue
Stefan Wahren wahrenst@gmx.net watchdog: bcm2835_wdt: Fix module autoload
Adrian Vladu avladu@cloudbasesolutions.com tools: hv: fix KVP and VSS daemons exit code
Hans Ulli Kroll ulli.kroll@googlemail.com usb: host: fotg2: restart hcd after port reset
Benjamin Herrenschmidt benh@kernel.crashing.org usb: gadget: composite: Clear "suspended" on reset/disconnect
Arnd Bergmann arnd@arndb.de dmaengine: ste_dma40: fix unneeded variable warning
Adrian Hunter adrian.hunter@intel.com scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()
Tom Lendacky thomas.lendacky@amd.com x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
Chen Yu yu.c.chen@intel.com x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume
Sasha Levin sashal@kernel.org Revert "perf test 6: Fix missing kvm module load for s390"
Dirk Morris dmorris@metaloft.com netfilter: conntrack: Use consistent ct id hash calculation
Florian Westphal fw@strlen.de netfilter: ctnetlink: don't use conntrack/expect object addresses as id
Eric Dumazet edumazet@google.com inet: switch IP ID generator to siphash
Jason A. Donenfeld Jason@zx2c4.com siphash: implement HalfSipHash1-3 for hash tables
Jason A. Donenfeld Jason@zx2c4.com siphash: add cryptographically secure PRF
Jason Wang jasowang@redhat.com vhost: scsi: add weight support
Jason Wang jasowang@redhat.com vhost_net: fix possible infinite loop
Jason Wang jasowang@redhat.com vhost: introduce vhost_exceeds_weight()
Jason Wang jasowang@redhat.com vhost_net: introduce vhost_exceeds_weight()
Paolo Abeni pabeni@redhat.com vhost_net: use packet weight for rx handler, too
haibinzhang(张海斌) haibinzhang@tencent.com vhost-net: set packet weight of tx polling to 2 * vq size
Alexander Kochetkov al.kochet@gmail.com net: arc_emac: fix koops caused by sk_buff free
Bob Peterson rpeterso@redhat.com GFS2: don't set rgrp gl_object until it's inserted into rgrp tree
Daniel Bristot de Oliveira bristot@redhat.com cgroup: Disable IRQs while holding css_set_lock
Mikulas Patocka mpatocka@redhat.com dm table: fix invalid memory accesses with too high sector number
ZhangXiaoxu zhangxiaoxu5@huawei.com dm space map metadata: fix missing store of apply_bops() return value
ZhangXiaoxu zhangxiaoxu5@huawei.com dm btree: fix order of block initialization in btree_split_beneath
John Hubbard jhubbard@nvidia.com x86/boot: Fix boot regression caused by bootparam sanitizing
John Hubbard jhubbard@nvidia.com x86/boot: Save fields explicitly, zero out everything else
Thomas Gleixner tglx@linutronix.de x86/apic: Handle missing global clockevent gracefully
Sean Christopherson sean.j.christopherson@intel.com x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
Oleg Nesterov oleg@redhat.com userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
Mikulas Patocka mpatocka@redhat.com Revert "dm bufio: fix deadlock with loop device"
Aaron Armstrong Skomra skomra@gmail.com HID: wacom: correct misreported EKR ring values
Naresh Kamboju <naresh.kamboju () linaro ! org> selftests: kvm: Adding config fragments
Jens Axboe axboe@kernel.dk libata: add SG safety checks in SFF pio transfers
Jiangfeng Xiao xiaojiangfeng@huawei.com net: hisilicon: Fix dma_map_single failed on arm64
Jiangfeng Xiao xiaojiangfeng@huawei.com net: hisilicon: fix hip04-xmit never return TX_BUSY
Jiangfeng Xiao xiaojiangfeng@huawei.com net: hisilicon: make hip04_tx_reclaim non-reentrant
Christophe JAILLET christophe.jaillet@wanadoo.fr net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
Wang Xiayang xywang.sjtu@sjtu.edu.cn can: peak_usb: force the string buffer NULL-terminated
Wang Xiayang xywang.sjtu@sjtu.edu.cn can: sja1000: force the string buffer NULL-terminated
Jiri Olsa jolsa@kernel.org perf bench numa: Fix cpu0 binding
Juliana Rodrigueiro juliana.rodrigueiro@intra2net.com isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack
Jia-Ju Bai baijiaju1990@gmail.com isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain()
Bob Ham bob.ham@puri.sm net: usb: qmi_wwan: Add the BroadMobi BM818 card
Peter Ujfalusi peter.ujfalusi@ti.com ASoC: ti: davinci-mcasp: Correct slot_width posed constraint
Navid Emamdoost navid.emamdoost@gmail.com st_nci_hci_connectivity_event_received: null check the allocation
Navid Emamdoost navid.emamdoost@gmail.com st21nfca_connectivity_event_received: null check the allocation
Ricard Wanderlof ricard.wanderlof@axis.com ASoC: Fail card instantiation if DAI format setup fails
Rasmus Villemoes rasmus.villemoes@prevas.dk can: dev: call netif_carrier_off() in register_candev()
Thomas Falcon tlfalcon@linux.ibm.com bonding: Force slave speed check after link state recovery for 802.3ad
Wenwen Wang wenwen@cs.uga.edu netfilter: ebtables: fix a memory leak bug in compat
Thomas Bogendoerfer tbogendoerfer@suse.de MIPS: kernel: only use i8253 clocksource with periodic clockevent
Ilya Trukhanov lahvuun@gmail.com HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT
-------------
Diffstat:
Documentation/kernel-parameters.txt | 7 + Documentation/siphash.txt | 175 +++++++ MAINTAINERS | 7 + Makefile | 4 +- arch/mips/kernel/i8253.c | 3 +- arch/x86/include/asm/bootparam_utils.h | 60 ++- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/msr.h | 10 + arch/x86/include/asm/nospec-branch.h | 2 +- arch/x86/include/asm/ptrace.h | 6 +- arch/x86/include/asm/suspend_32.h | 1 + arch/x86/include/asm/suspend_64.h | 1 + arch/x86/kernel/apic/apic.c | 72 ++- arch/x86/kernel/apic/bigsmp_32.c | 24 +- arch/x86/kernel/cpu/amd.c | 66 +++ arch/x86/kernel/ptrace.c | 3 +- arch/x86/kernel/uprobes.c | 17 +- arch/x86/kvm/x86.c | 9 +- arch/x86/power/cpu.c | 152 ++++++ drivers/ata/libata-sff.c | 6 + drivers/dma/ste_dma40.c | 4 +- drivers/hid/hid-tmff.c | 12 + drivers/hid/wacom_wac.c | 2 +- drivers/hwtracing/stm/core.c | 1 - drivers/isdn/hardware/mISDN/hfcsusb.c | 13 +- drivers/md/dm-bufio.c | 4 +- drivers/md/dm-table.c | 5 +- drivers/md/persistent-data/dm-btree.c | 31 +- drivers/md/persistent-data/dm-space-map-metadata.c | 2 +- drivers/misc/vmw_vmci/vmci_doorbell.c | 6 +- drivers/mmc/core/sd.c | 6 + drivers/mmc/host/sdhci-of-at91.c | 3 + drivers/net/bonding/bond_main.c | 9 + drivers/net/can/dev.c | 2 + drivers/net/can/sja1000/peak_pcmcia.c | 2 +- drivers/net/can/usb/peak_usb/pcan_usb_core.c | 2 +- drivers/net/ethernet/arc/emac_main.c | 9 +- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 5 +- drivers/net/ethernet/hisilicon/hip04_eth.c | 28 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/nfc/st-nci/se.c | 2 + drivers/nfc/st21nfca/se.c | 2 + drivers/scsi/ufs/ufshcd.c | 3 + drivers/usb/class/cdc-wdm.c | 16 +- drivers/usb/gadget/composite.c | 1 + drivers/usb/host/fotg210-hcd.c | 4 + drivers/usb/host/ohci-hcd.c | 15 +- drivers/usb/storage/realtek_cr.c | 15 +- drivers/usb/storage/unusual_devs.h | 2 +- drivers/vhost/net.c | 31 +- drivers/vhost/scsi.c | 15 +- drivers/vhost/vhost.c | 20 +- drivers/vhost/vhost.h | 6 +- drivers/watchdog/bcm2835_wdt.c | 1 + fs/gfs2/rgrp.c | 13 +- fs/nfs/nfs4_fs.h | 3 +- fs/nfs/nfs4client.c | 5 +- fs/nfs/nfs4state.c | 27 +- fs/userfaultfd.c | 25 +- include/linux/siphash.h | 145 ++++++ include/net/netfilter/nf_conntrack.h | 2 + include/net/netns/ipv4.h | 2 + include/net/tcp.h | 4 + kernel/cgroup.c | 122 ++--- lib/Kconfig.debug | 10 + lib/Makefile | 3 +- lib/siphash.c | 551 +++++++++++++++++++++ lib/test_siphash.c | 223 +++++++++ net/bridge/netfilter/ebtables.c | 4 +- net/core/stream.c | 16 +- net/ipv4/route.c | 12 +- net/ipv6/output_core.c | 30 +- net/mac80211/cfg.c | 9 +- net/netfilter/nf_conntrack_core.c | 35 ++ net/netfilter/nf_conntrack_netlink.c | 34 +- net/wireless/reg.c | 2 +- sound/core/seq/seq_clientmgr.c | 3 +- sound/core/seq/seq_fifo.c | 17 + sound/core/seq/seq_fifo.h | 2 + sound/soc/davinci/davinci-mcasp.c | 43 +- sound/soc/soc-core.c | 7 +- sound/usb/mixer.c | 30 +- tools/hv/hv_kvp_daemon.c | 2 + tools/hv/hv_vss_daemon.c | 2 + tools/perf/bench/numa.c | 6 +- tools/perf/tests/parse-events.c | 27 - tools/testing/selftests/kvm/config | 3 + 87 files changed, 2015 insertions(+), 310 deletions(-)
[ Upstream commit 65f11c72780fa9d598df88def045ccb6a885cf80 ]
Enable force feedback for the Thrustmaster Dual Trigger 2 in 1 Rumble Force gamepad. Compared to other Thrustmaster devices, left and right rumble motors here are swapped.
Signed-off-by: Ilya Trukhanov lahvuun@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-tmff.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/hid/hid-tmff.c b/drivers/hid/hid-tmff.c index b83376077d722..cfa0cb22c9b3c 100644 --- a/drivers/hid/hid-tmff.c +++ b/drivers/hid/hid-tmff.c @@ -34,6 +34,8 @@
#include "hid-ids.h"
+#define THRUSTMASTER_DEVICE_ID_2_IN_1_DT 0xb320 + static const signed short ff_rumble[] = { FF_RUMBLE, -1 @@ -88,6 +90,7 @@ static int tmff_play(struct input_dev *dev, void *data, struct hid_field *ff_field = tmff->ff_field; int x, y; int left, right; /* Rumbling */ + int motor_swap;
switch (effect->type) { case FF_CONSTANT: @@ -112,6 +115,13 @@ static int tmff_play(struct input_dev *dev, void *data, ff_field->logical_minimum, ff_field->logical_maximum);
+ /* 2-in-1 strong motor is left */ + if (hid->product == THRUSTMASTER_DEVICE_ID_2_IN_1_DT) { + motor_swap = left; + left = right; + right = motor_swap; + } + dbg_hid("(left,right)=(%08x, %08x)\n", left, right); ff_field->value[0] = left; ff_field->value[1] = right; @@ -238,6 +248,8 @@ static const struct hid_device_id tm_devices[] = { .driver_data = (unsigned long)ff_rumble }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb304), /* FireStorm Dual Power 2 (and 3) */ .driver_data = (unsigned long)ff_rumble }, + { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, THRUSTMASTER_DEVICE_ID_2_IN_1_DT), /* Dual Trigger 2-in-1 */ + .driver_data = (unsigned long)ff_rumble }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb323), /* Dual Trigger 3-in-1 (PC Mode) */ .driver_data = (unsigned long)ff_rumble }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb324), /* Dual Trigger 3-in-1 (PS3 Mode) */
[ Upstream commit a07e3324538a989b7cdbf2c679be6a7f9df2544f ]
i8253 clocksource needs a free running timer. This could only be used, if i8253 clockevent is set up as periodic.
Signed-off-by: Thomas Bogendoerfer tbogendoerfer@suse.de Signed-off-by: Paul Burton paul.burton@mips.com Cc: Ralf Baechle ralf@linux-mips.org Cc: James Hogan jhogan@kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/kernel/i8253.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/mips/kernel/i8253.c b/arch/mips/kernel/i8253.c index c5bc344fc745c..73039746ae364 100644 --- a/arch/mips/kernel/i8253.c +++ b/arch/mips/kernel/i8253.c @@ -31,7 +31,8 @@ void __init setup_pit_timer(void)
static int __init init_pit_clocksource(void) { - if (num_possible_cpus() > 1) /* PIT does not scale! */ + if (num_possible_cpus() > 1 || /* PIT does not scale! */ + !clockevent_state_periodic(&i8253_clockevent)) return 0;
return clocksource_i8253_init();
[ Upstream commit 15a78ba1844a8e052c1226f930133de4cef4e7ad ]
In compat_do_replace(), a temporary buffer is allocated through vmalloc() to hold entries copied from the user space. The buffer address is firstly saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then the entries in this temporary buffer is copied to the internal kernel structure through compat_copy_entries(). If this copy process fails, compat_do_replace() should be terminated. However, the allocated temporary buffer is not freed on this path, leading to a memory leak.
To fix the bug, free the buffer before returning from compat_do_replace().
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/netfilter/ebtables.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 1a87cf78fadc4..d9471e3ef2161 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -2280,8 +2280,10 @@ static int compat_do_replace(struct net *net, void __user *user, state.buf_kern_len = size64;
ret = compat_copy_entries(entries_tmp, tmp.entries_size, &state); - if (WARN_ON(ret < 0)) + if (WARN_ON(ret < 0)) { + vfree(entries_tmp); goto out_unlock; + }
vfree(entries_tmp); tmp.entries_size = size64;
[ Upstream commit 12185dfe44360f814ac4ead9d22ad2af7511b2e9 ]
The following scenario was encountered during testing of logical partition mobility on pseries partitions with bonded ibmvnic adapters in LACP mode.
1. Driver receives a signal that the device has been swapped, and it needs to reset to initialize the new device.
2. Driver reports loss of carrier and begins initialization.
3. Bonding driver receives NETDEV_CHANGE notifier and checks the slave's current speed and duplex settings. Because these are unknown at the time, the bond sets its link state to BOND_LINK_FAIL and handles the speed update, clearing AD_PORT_LACP_ENABLE.
4. Driver finishes recovery and reports that the carrier is on.
5. Bond receives a new notification and checks the speed again. The speeds are valid but miimon has not altered the link state yet. AD_PORT_LACP_ENABLE remains off.
Because the slave's link state is still BOND_LINK_FAIL, no further port checks are made when it recovers. Though the slave devices are operational and have valid speed and duplex settings, the bond will not send LACPDU's. The simplest fix I can see is to force another speed check in bond_miimon_commit. This way the bond will update AD_PORT_LACP_ENABLE if needed when transitioning from BOND_LINK_FAIL to BOND_LINK_UP.
CC: Jarod Wilson jarod@redhat.com CC: Jay Vosburgh j.vosburgh@gmail.com CC: Veaceslav Falico vfalico@gmail.com CC: Andy Gospodarek andy@greyhouse.net Signed-off-by: Thomas Falcon tlfalcon@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 168f2331194ff..fd6aff9f0052e 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2081,6 +2081,15 @@ static void bond_miimon_commit(struct bonding *bond) bond_for_each_slave(bond, slave, iter) { switch (slave->new_link) { case BOND_LINK_NOCHANGE: + /* For 802.3ad mode, check current slave speed and + * duplex again in case its port was disabled after + * invalid speed/duplex reporting but recovered before + * link monitoring could make a decision on the actual + * link status + */ + if (BOND_MODE(bond) == BOND_MODE_8023AD && + slave->link == BOND_LINK_UP) + bond_3ad_adapter_speed_duplex_changed(slave); continue;
case BOND_LINK_UP:
[ Upstream commit c63845609c4700488e5eacd6ab4d06d5d420e5ef ]
CONFIG_CAN_LEDS is deprecated. When trying to use the generic netdev trigger as suggested, there's a small inconsistency with the link property: The LED is on initially, stays on when the device is brought up, and then turns off (as expected) when the device is brought down.
Make sure the LED always reflects the state of the CAN device.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/dev.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index 8b7c6425b681d..9dd968ee792e0 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -1065,6 +1065,8 @@ static struct rtnl_link_ops can_link_ops __read_mostly = { int register_candev(struct net_device *dev) { dev->rtnl_link_ops = &can_link_ops; + netif_carrier_off(dev); + return register_netdev(dev); } EXPORT_SYMBOL_GPL(register_candev);
[ Upstream commit 40aa5383e393d72f6aa3943a4e7b1aae25a1e43b ]
If the DAI format setup fails, there is no valid communication format between CPU and CODEC, so fail card instantiation, rather than continue with a card that will most likely not function properly.
Signed-off-by: Ricard Wanderlof ricardw@axis.com Link: https://lore.kernel.org/r/alpine.DEB.2.20.1907241132350.6338@lnxricardw1.se.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index b927f9c81d922..8d10a24d38e06 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -1357,8 +1357,11 @@ static int soc_probe_link_dais(struct snd_soc_card *card, int num, int order) } }
- if (dai_link->dai_fmt) - snd_soc_runtime_set_dai_fmt(rtd, dai_link->dai_fmt); + if (dai_link->dai_fmt) { + ret = snd_soc_runtime_set_dai_fmt(rtd, dai_link->dai_fmt); + if (ret) + return ret; + }
ret = soc_post_component_init(rtd, dai_link->name); if (ret)
On Wed, Sep 04, 2019 at 07:52:53PM +0200, Greg Kroah-Hartman wrote:
[ Upstream commit 40aa5383e393d72f6aa3943a4e7b1aae25a1e43b ]
If the DAI format setup fails, there is no valid communication format between CPU and CODEC, so fail card instantiation, rather than continue with a card that will most likely not function properly.
I nacked this patch when Sasha posted it - it only improves diagnostics and might make systems that worked by accident break since it turns things into a hard failure, it won't make anything that didn't work previously work.
On Wed, Sep 04, 2019 at 07:09:52PM +0100, Mark Brown wrote:
On Wed, Sep 04, 2019 at 07:52:53PM +0200, Greg Kroah-Hartman wrote:
[ Upstream commit 40aa5383e393d72f6aa3943a4e7b1aae25a1e43b ]
If the DAI format setup fails, there is no valid communication format between CPU and CODEC, so fail card instantiation, rather than continue with a card that will most likely not function properly.
I nacked this patch when Sasha posted it - it only improves diagnostics and might make systems that worked by accident break since it turns things into a hard failure, it won't make anything that didn't work previously work.
Now dropped.
On Thu, Sep 05, 2019 at 08:56:48PM +0200, Greg Kroah-Hartman wrote:
On Wed, Sep 04, 2019 at 07:09:52PM +0100, Mark Brown wrote:
I nacked this patch when Sasha posted it - it only improves diagnostics and might make systems that worked by accident break since it turns things into a hard failure, it won't make anything that didn't work previously work.
Now dropped.
Thanks!
[ Upstream commit 9891d06836e67324c9e9c4675ed90fc8b8110034 ]
devm_kzalloc may fail and return null. So the null check is needed.
Signed-off-by: Navid Emamdoost navid.emamdoost@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/st21nfca/se.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nfc/st21nfca/se.c b/drivers/nfc/st21nfca/se.c index c79d99b24c961..f1b96b5255e08 100644 --- a/drivers/nfc/st21nfca/se.c +++ b/drivers/nfc/st21nfca/se.c @@ -327,6 +327,8 @@ int st21nfca_connectivity_event_received(struct nfc_hci_dev *hdev, u8 host,
transaction = (struct nfc_evt_transaction *)devm_kzalloc(dev, skb->len - 2, GFP_KERNEL); + if (!transaction) + return -ENOMEM;
transaction->aid_len = skb->data[1]; memcpy(transaction->aid, &skb->data[2],
[ Upstream commit 3008e06fdf0973770370f97d5f1fba3701d8281d ]
devm_kzalloc may fail and return NULL. So the null check is needed.
Signed-off-by: Navid Emamdoost navid.emamdoost@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/st-nci/se.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nfc/st-nci/se.c b/drivers/nfc/st-nci/se.c index dbab722a06546..6f9d9b90ac645 100644 --- a/drivers/nfc/st-nci/se.c +++ b/drivers/nfc/st-nci/se.c @@ -346,6 +346,8 @@ static int st_nci_hci_connectivity_event_received(struct nci_dev *ndev,
transaction = (struct nfc_evt_transaction *)devm_kzalloc(dev, skb->len - 2, GFP_KERNEL); + if (!transaction) + return -ENOMEM;
transaction->aid_len = skb->data[1]; memcpy(transaction->aid, &skb->data[2], transaction->aid_len);
[ Upstream commit 1e112c35e3c96db7c8ca6ddaa96574f00c06e7db ]
The slot_width is a property for the bus while the constraint for SNDRV_PCM_HW_PARAM_SAMPLE_BITS is for the in memory format.
Applying slot_width constraint to sample_bits works most of the time, but it will blacklist valid formats in some cases.
With slot_width 24 we can support S24_3LE and S24_LE formats as they both look the same on the bus, but a a 24 constraint on sample_bits would not allow S24_LE as it is stored in 32bits in memory.
Implement a simple hw_rule function to allow all formats which require less or equal number of bits on the bus as slot_width (if configured).
Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Link: https://lore.kernel.org/r/20190726064244.3762-2-peter.ujfalusi@ti.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/davinci/davinci-mcasp.c | 43 ++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 9 deletions(-)
diff --git a/sound/soc/davinci/davinci-mcasp.c b/sound/soc/davinci/davinci-mcasp.c index 512ec25c9ead1..2f7be6cee98e9 100644 --- a/sound/soc/davinci/davinci-mcasp.c +++ b/sound/soc/davinci/davinci-mcasp.c @@ -1128,6 +1128,28 @@ static int davinci_mcasp_trigger(struct snd_pcm_substream *substream, return ret; }
+static int davinci_mcasp_hw_rule_slot_width(struct snd_pcm_hw_params *params, + struct snd_pcm_hw_rule *rule) +{ + struct davinci_mcasp_ruledata *rd = rule->private; + struct snd_mask *fmt = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT); + struct snd_mask nfmt; + int i, slot_width; + + snd_mask_none(&nfmt); + slot_width = rd->mcasp->slot_width; + + for (i = 0; i <= SNDRV_PCM_FORMAT_LAST; i++) { + if (snd_mask_test(fmt, i)) { + if (snd_pcm_format_width(i) <= slot_width) { + snd_mask_set(&nfmt, i); + } + } + } + + return snd_mask_refine(fmt, &nfmt); +} + static const unsigned int davinci_mcasp_dai_rates[] = { 8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200, 96000, 176400, 192000, @@ -1219,7 +1241,7 @@ static int davinci_mcasp_startup(struct snd_pcm_substream *substream, struct davinci_mcasp_ruledata *ruledata = &mcasp->ruledata[substream->stream]; u32 max_channels = 0; - int i, dir; + int i, dir, ret; int tdm_slots = mcasp->tdm_slots;
if (mcasp->tdm_mask[substream->stream]) @@ -1244,6 +1266,7 @@ static int davinci_mcasp_startup(struct snd_pcm_substream *substream, max_channels++; } ruledata->serializers = max_channels; + ruledata->mcasp = mcasp; max_channels *= tdm_slots; /* * If the already active stream has less channels than the calculated @@ -1269,20 +1292,22 @@ static int davinci_mcasp_startup(struct snd_pcm_substream *substream, 0, SNDRV_PCM_HW_PARAM_CHANNELS, &mcasp->chconstr[substream->stream]);
- if (mcasp->slot_width) - snd_pcm_hw_constraint_minmax(substream->runtime, - SNDRV_PCM_HW_PARAM_SAMPLE_BITS, - 8, mcasp->slot_width); + if (mcasp->slot_width) { + /* Only allow formats require <= slot_width bits on the bus */ + ret = snd_pcm_hw_rule_add(substream->runtime, 0, + SNDRV_PCM_HW_PARAM_FORMAT, + davinci_mcasp_hw_rule_slot_width, + ruledata, + SNDRV_PCM_HW_PARAM_FORMAT, -1); + if (ret) + return ret; + }
/* * If we rely on implicit BCLK divider setting we should * set constraints based on what we can provide. */ if (mcasp->bclk_master && mcasp->bclk_div == 0 && mcasp->sysclk_freq) { - int ret; - - ruledata->mcasp = mcasp; - ret = snd_pcm_hw_rule_add(substream->runtime, 0, SNDRV_PCM_HW_PARAM_RATE, davinci_mcasp_hw_rule_rate,
[ Upstream commit 9a07406b00cdc6ec689dc142540739575c717f3c ]
The BroadMobi BM818 M.2 card uses the QMI protocol
Signed-off-by: Bob Ham bob.ham@puri.sm Signed-off-by: Angus Ainslie (Purism) angus@akkea.ca Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index ee6fefe92af43..4391430e25273 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -719,6 +719,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x2001, 0x7e35, 4)}, /* D-Link DWM-222 */ {QMI_FIXED_INTF(0x2020, 0x2031, 4)}, /* Olicard 600 */ {QMI_FIXED_INTF(0x2020, 0x2033, 4)}, /* BroadMobi BM806U */ + {QMI_FIXED_INTF(0x2020, 0x2060, 4)}, /* BroadMobi BM818 */ {QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)}, /* Sierra Wireless MC7700 */ {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */ {QMI_FIXED_INTF(0x1199, 0x68a2, 8)}, /* Sierra Wireless MC7710 in QMI mode */
[ Upstream commit a0d57a552b836206ad7705a1060e6e1ce5a38203 ]
In start_isoc_chain(), usb_alloc_urb() on line 1392 may fail and return NULL. At this time, fifo->iso[i].urb is assigned to NULL.
Then, fifo->iso[i].urb is used at some places, such as: LINE 1405: fill_isoc_urb(fifo->iso[i].urb, ...) urb->number_of_packets = num_packets; urb->transfer_flags = URB_ISO_ASAP; urb->actual_length = 0; urb->interval = interval; LINE 1416: fifo->iso[i].urb->... LINE 1419: fifo->iso[i].urb->...
Thus, possible null-pointer dereferences may occur.
To fix these bugs, "continue" is added to avoid using fifo->iso[i].urb when it is NULL.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/hardware/mISDN/hfcsusb.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c index c60c7998af173..6f19530ba2a93 100644 --- a/drivers/isdn/hardware/mISDN/hfcsusb.c +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c @@ -1402,6 +1402,7 @@ start_isoc_chain(struct usb_fifo *fifo, int num_packets_per_urb, printk(KERN_DEBUG "%s: %s: alloc urb for fifo %i failed", hw->name, __func__, fifo->fifonum); + continue; } fifo->iso[i].owner_fifo = (struct usb_fifo *) fifo; fifo->iso[i].indx = i;
[ Upstream commit d8a1de3d5bb881507602bc02e004904828f88711 ]
Since linux 4.9 it is not possible to use buffers on the stack for DMA transfers.
During usb probe the driver crashes with "transfer buffer is on stack" message.
This fix k-allocates a buffer to be used on "read_reg_atomic", which is a macro that calls "usb_control_msg" under the hood.
Kernel 4.19 backtrace:
usb_hcd_submit_urb+0x3e5/0x900 ? sched_clock+0x9/0x10 ? log_store+0x203/0x270 ? get_random_u32+0x6f/0x90 ? cache_alloc_refill+0x784/0x8a0 usb_submit_urb+0x3b4/0x550 usb_start_wait_urb+0x4e/0xd0 usb_control_msg+0xb8/0x120 hfcsusb_probe+0x6bc/0xb40 [hfcsusb] usb_probe_interface+0xc2/0x260 really_probe+0x176/0x280 driver_probe_device+0x49/0x130 __driver_attach+0xa9/0xb0 ? driver_probe_device+0x130/0x130 bus_for_each_dev+0x5a/0x90 driver_attach+0x14/0x20 ? driver_probe_device+0x130/0x130 bus_add_driver+0x157/0x1e0 driver_register+0x51/0xe0 usb_register_driver+0x5d/0x120 ? 0xf81ed000 hfcsusb_drv_init+0x17/0x1000 [hfcsusb] do_one_initcall+0x44/0x190 ? free_unref_page_commit+0x6a/0xd0 do_init_module+0x46/0x1c0 load_module+0x1dc1/0x2400 sys_init_module+0xed/0x120 do_fast_syscall_32+0x7a/0x200 entry_SYSENTER_32+0x6b/0xbe
Signed-off-by: Juliana Rodrigueiro juliana.rodrigueiro@intra2net.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/hardware/mISDN/hfcsusb.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c index 6f19530ba2a93..726fba452f5f6 100644 --- a/drivers/isdn/hardware/mISDN/hfcsusb.c +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c @@ -1701,13 +1701,23 @@ hfcsusb_stop_endpoint(struct hfcsusb *hw, int channel) static int setup_hfcsusb(struct hfcsusb *hw) { + void *dmabuf = kmalloc(sizeof(u_char), GFP_KERNEL); u_char b; + int ret;
if (debug & DBG_HFC_CALL_TRACE) printk(KERN_DEBUG "%s: %s\n", hw->name, __func__);
+ if (!dmabuf) + return -ENOMEM; + + ret = read_reg_atomic(hw, HFCUSB_CHIP_ID, dmabuf); + + memcpy(&b, dmabuf, sizeof(u_char)); + kfree(dmabuf); + /* check the chip id */ - if (read_reg_atomic(hw, HFCUSB_CHIP_ID, &b) != 1) { + if (ret != 1) { printk(KERN_DEBUG "%s: %s: cannot read chip id\n", hw->name, __func__); return 1;
[ Upstream commit 6bbfe4e602691b90ac866712bd4c43c51e546a60 ]
Michael reported an issue with perf bench numa failing with binding to cpu0 with '-0' option.
# perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd # Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd" binding to node 0, mask: 0000000000000001 => -1 perf: bench/numa.c:356: bind_to_memnode: Assertion `!(ret)' failed. Aborted (core dumped)
This happens when the cpu0 is not part of node0, which is the benchmark assumption and we can see that's not the case for some powerpc servers.
Using correct node for cpu0 binding.
Reported-by: Michael Petlan mpetlan@redhat.com Signed-off-by: Jiri Olsa jolsa@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Satheesh Rajendran sathnaga@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20190801142642.28004-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/bench/numa.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c index df41deed0320e..3bfba81d19118 100644 --- a/tools/perf/bench/numa.c +++ b/tools/perf/bench/numa.c @@ -370,8 +370,10 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
/* Allocate and initialize all memory on CPU#0: */ if (init_cpu0) { - orig_mask = bind_to_node(0); - bind_to_memnode(0); + int node = numa_node_of_cpu(0); + + orig_mask = bind_to_node(node); + bind_to_memnode(node); }
bytes = bytes0 + HPSIZE;
[ Upstream commit cd28aa2e056cd1ea79fc5f24eed0ce868c6cab5c ]
strncpy() does not ensure NULL-termination when the input string size equals to the destination buffer size IFNAMSIZ. The output string 'name' is passed to dev_info which relies on NULL-termination.
Use strlcpy() instead.
This issue is identified by a Coccinelle script.
Signed-off-by: Wang Xiayang xywang.sjtu@sjtu.edu.cn Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/sja1000/peak_pcmcia.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/sja1000/peak_pcmcia.c b/drivers/net/can/sja1000/peak_pcmcia.c index dd56133cc4616..fc9f8b01ecae2 100644 --- a/drivers/net/can/sja1000/peak_pcmcia.c +++ b/drivers/net/can/sja1000/peak_pcmcia.c @@ -487,7 +487,7 @@ static void pcan_free_channels(struct pcan_pccard *card) if (!netdev) continue;
- strncpy(name, netdev->name, IFNAMSIZ); + strlcpy(name, netdev->name, IFNAMSIZ);
unregister_sja1000dev(netdev);
[ Upstream commit e787f19373b8a5fa24087800ed78314fd17b984a ]
strncpy() does not ensure NULL-termination when the input string size equals to the destination buffer size IFNAMSIZ. The output string is passed to dev_info() which relies on the NULL-termination.
Use strlcpy() instead.
This issue is identified by a Coccinelle script.
Signed-off-by: Wang Xiayang xywang.sjtu@sjtu.edu.cn Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/usb/peak_usb/pcan_usb_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index e13bc27b42911..b1d68f49b3989 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -881,7 +881,7 @@ static void peak_usb_disconnect(struct usb_interface *intf)
dev_prev_siblings = dev->prev_siblings; dev->state &= ~PCAN_USB_STATE_CONNECTED; - strncpy(name, netdev->name, IFNAMSIZ); + strlcpy(name, netdev->name, IFNAMSIZ);
unregister_netdev(netdev);
[ Upstream commit c77e22834ae9a11891cb613bd9a551be1b94f2bc ]
John Hubbard reports seeing the following stack trace:
nfs4_do_reclaim rcu_read_lock /* we are now in_atomic() and must not sleep */ nfs4_purge_state_owners nfs4_free_state_owner nfs4_destroy_seqid_counter rpc_destroy_wait_queue cancel_delayed_work_sync __cancel_work_timer __flush_work start_flush_work might_sleep: (kernel/workqueue.c:2975: BUG)
The solution is to separate out the freeing of the state owners from nfs4_purge_state_owners(), and perform that outside the atomic context.
Reported-by: John Hubbard jhubbard@nvidia.com Fixes: 0aaaf5c424c7f ("NFS: Cache state owners after files are closed") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4_fs.h | 3 ++- fs/nfs/nfs4client.c | 5 ++++- fs/nfs/nfs4state.c | 27 ++++++++++++++++++++++----- 3 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index 4afdee420d253..9f15696f55b9f 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -416,7 +416,8 @@ static inline void nfs4_schedule_session_recovery(struct nfs4_session *session,
extern struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *, struct rpc_cred *, gfp_t); extern void nfs4_put_state_owner(struct nfs4_state_owner *); -extern void nfs4_purge_state_owners(struct nfs_server *); +extern void nfs4_purge_state_owners(struct nfs_server *, struct list_head *); +extern void nfs4_free_state_owners(struct list_head *head); extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *); extern void nfs4_put_open_state(struct nfs4_state *); extern void nfs4_close_state(struct nfs4_state *, fmode_t); diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index ae91d1e450be7..dac20f31f01f8 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -685,9 +685,12 @@ found:
static void nfs4_destroy_server(struct nfs_server *server) { + LIST_HEAD(freeme); + nfs_server_return_all_delegations(server); unset_pnfs_layoutdriver(server); - nfs4_purge_state_owners(server); + nfs4_purge_state_owners(server, &freeme); + nfs4_free_state_owners(&freeme); }
/* diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 5be61affeefd8..ef3ed2b1fd278 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -611,24 +611,39 @@ void nfs4_put_state_owner(struct nfs4_state_owner *sp) /** * nfs4_purge_state_owners - Release all cached state owners * @server: nfs_server with cached state owners to release + * @head: resulting list of state owners * * Called at umount time. Remaining state owners will be on * the LRU with ref count of zero. + * Note that the state owners are not freed, but are added + * to the list @head, which can later be used as an argument + * to nfs4_free_state_owners. */ -void nfs4_purge_state_owners(struct nfs_server *server) +void nfs4_purge_state_owners(struct nfs_server *server, struct list_head *head) { struct nfs_client *clp = server->nfs_client; struct nfs4_state_owner *sp, *tmp; - LIST_HEAD(doomed);
spin_lock(&clp->cl_lock); list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) { - list_move(&sp->so_lru, &doomed); + list_move(&sp->so_lru, head); nfs4_remove_state_owner_locked(sp); } spin_unlock(&clp->cl_lock); +}
- list_for_each_entry_safe(sp, tmp, &doomed, so_lru) { +/** + * nfs4_purge_state_owners - Release all cached state owners + * @head: resulting list of state owners + * + * Frees a list of state owners that was generated by + * nfs4_purge_state_owners + */ +void nfs4_free_state_owners(struct list_head *head) +{ + struct nfs4_state_owner *sp, *tmp; + + list_for_each_entry_safe(sp, tmp, head, so_lru) { list_del(&sp->so_lru); nfs4_free_state_owner(sp); } @@ -1724,12 +1739,13 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov struct nfs4_state_owner *sp; struct nfs_server *server; struct rb_node *pos; + LIST_HEAD(freeme); int status = 0;
restart: rcu_read_lock(); list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) { - nfs4_purge_state_owners(server); + nfs4_purge_state_owners(server, &freeme); spin_lock(&clp->cl_lock); for (pos = rb_first(&server->state_owners); pos != NULL; @@ -1758,6 +1774,7 @@ restart: spin_unlock(&clp->cl_lock); } rcu_read_unlock(); + nfs4_free_state_owners(&freeme); return 0; }
[ Upstream commit debea2cd3193ac868289e8893c3a719c265b0612 ]
A call to 'kfree_skb()' is missing in the error handling path of 'init_one()'. This is already present in 'remove_one()' but is missing here.
Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c index 3dd4c39640dc4..bee615cddbdd8 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -3260,7 +3260,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (!adapter->regs) { dev_err(&pdev->dev, "cannot map device registers\n"); err = -ENOMEM; - goto out_free_adapter; + goto out_free_adapter_nofail; }
adapter->pdev = pdev; @@ -3378,6 +3378,9 @@ out_free_dev: if (adapter->port[i]) free_netdev(adapter->port[i]);
+out_free_adapter_nofail: + kfree_skb(adapter->nofail_skb); + out_free_adapter: kfree(adapter);
[ Upstream commit 1a2c070ae805910a853b4a14818481ed2e17c727 ]
If hip04_tx_reclaim is interrupted while it is running and then __napi_schedule continues to execute hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs and oops is generated. So you need to mask the interrupt during the hip04_tx_reclaim run.
The kernel oops exception stack is as follows:
Unable to handle kernel NULL pointer dereference at virtual address 00000050 pgd = c0003000 [00000050] *pgd=80000000a04003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP ARM Modules linked in: hip04_eth mtdblock mtd_blkdevs mtd ohci_platform ehci_platform ohci_hcd ehci_hcd vfat fat sd_mod usb_storage scsi_mod usbcore usb_common CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.4.185 #1 Hardware name: Hisilicon A15 task: c0a250e0 task.stack: c0a00000 PC is at hip04_tx_reclaim+0xe0/0x17c [hip04_eth] LR is at hip04_tx_reclaim+0x30/0x17c [hip04_eth] pc : [<bf30c3a4>] lr : [<bf30c2f4>] psr: 600e0313 sp : c0a01d88 ip : 00000000 fp : c0601f9c r10: 00000000 r9 : c3482380 r8 : 00000001 r7 : 00000000 r6 : 000000e1 r5 : c3482000 r4 : 0000000c r3 : f2209800 r2 : 00000000 r1 : 00000000 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 32c5387d Table: 03d28c80 DAC: 55555555 Process swapper/0 (pid: 0, stack limit = 0xc0a00190) Stack: (0xc0a01d88 to 0xc0a02000) [<bf30c3a4>] (hip04_tx_reclaim [hip04_eth]) from [<bf30d2e0>] (hip04_rx_poll+0x88/0x368 [hip04_eth]) [<bf30d2e0>] (hip04_rx_poll [hip04_eth]) from [<c04c2d9c>] (net_rx_action+0x114/0x34c) [<c04c2d9c>] (net_rx_action) from [<c021eed8>] (__do_softirq+0x218/0x318) [<c021eed8>] (__do_softirq) from [<c021f284>] (irq_exit+0x88/0xac) [<c021f284>] (irq_exit) from [<c0240090>] (msa_irq_exit+0x11c/0x1d4) [<c0240090>] (msa_irq_exit) from [<c02677e0>] (__handle_domain_irq+0x110/0x148) [<c02677e0>] (__handle_domain_irq) from [<c0201588>] (gic_handle_irq+0xd4/0x118) [<c0201588>] (gic_handle_irq) from [<c0551700>] (__irq_svc+0x40/0x58) Exception stack(0xc0a01f30 to 0xc0a01f78) 1f20: c0ae8b40 00000000 00000000 00000000 1f40: 00000002 ffffe000 c0601f9c 00000000 ffffffff c0a2257c c0a22440 c0831a38 1f60: c0a01ec4 c0a01f80 c0203714 c0203718 600e0213 ffffffff [<c0551700>] (__irq_svc) from [<c0203718>] (arch_cpu_idle+0x20/0x3c) [<c0203718>] (arch_cpu_idle) from [<c025bfd8>] (cpu_startup_entry+0x244/0x29c) [<c025bfd8>] (cpu_startup_entry) from [<c054b0d8>] (rest_init+0xc8/0x10c) [<c054b0d8>] (rest_init) from [<c0800c58>] (start_kernel+0x468/0x514) Code: a40599e5 016086e2 018088e2 7660efe6 (503090e5) ---[ end trace 1db21d6d09c49d74 ]--- Kernel panic - not syncing: Fatal exception in interrupt CPU3: stopping CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D O 4.4.185 #1
Signed-off-by: Jiangfeng Xiao xiaojiangfeng@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hip04_eth.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 60c727b0b7ab2..fdf8a477bec9c 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -497,6 +497,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) u16 len; u32 err;
+ /* clean up tx descriptors */ + tx_remaining = hip04_tx_reclaim(ndev, false); + while (cnt && !last) { buf = priv->rx_buf[priv->rx_head]; skb = build_skb(buf, priv->rx_buf_size); @@ -554,8 +557,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) } napi_complete(napi); done: - /* clean up tx descriptors and start a new timer if necessary */ - tx_remaining = hip04_tx_reclaim(ndev, false); + /* start a new timer if necessary */ if (rx < budget && tx_remaining) hip04_start_tx_timer(priv);
[ Upstream commit f2243b82785942be519016067ee6c55a063bbfe2 ]
TX_DESC_NUM is 256, in tx_count, the maximum value of mod(TX_DESC_NUM - 1) is 254, the variable "count" in the hip04_mac_start_xmit function is never equal to (TX_DESC_NUM - 1), so hip04_mac_start_xmit never return NETDEV_TX_BUSY.
tx_count is modified to mod(TX_DESC_NUM) so that the maximum value of tx_count can reach (TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn NETDEV_TX_BUSY.
Signed-off-by: Jiangfeng Xiao xiaojiangfeng@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hip04_eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index fdf8a477bec9c..a88d233df4e82 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -185,7 +185,7 @@ struct hip04_priv {
static inline unsigned int tx_count(unsigned int head, unsigned int tail) { - return (head - tail) % (TX_DESC_NUM - 1); + return (head - tail) % TX_DESC_NUM; }
static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
[ Upstream commit 96a50c0d907ac8f5c3d6b051031a19eb8a2b53e3 ]
On the arm64 platform, executing "ifconfig eth0 up" will fail, returning "ifconfig: SIOCSIFFLAGS: Input/output error."
ndev->dev is not initialized, dma_map_single->get_dma_ops-> dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE directly, so when we use dma_map_single, the first parameter is to use the device of platform_device.
Signed-off-by: Jiangfeng Xiao xiaojiangfeng@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index a88d233df4e82..def831c89d354 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -157,6 +157,7 @@ struct hip04_priv { unsigned int reg_inten;
struct napi_struct napi; + struct device *dev; struct net_device *ndev;
struct tx_desc *tx_desc; @@ -387,7 +388,7 @@ static int hip04_tx_reclaim(struct net_device *ndev, bool force) }
if (priv->tx_phys[tx_tail]) { - dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail], + dma_unmap_single(priv->dev, priv->tx_phys[tx_tail], priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE); priv->tx_phys[tx_tail] = 0; @@ -437,8 +438,8 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev) return NETDEV_TX_BUSY; }
- phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE); - if (dma_mapping_error(&ndev->dev, phys)) { + phys = dma_map_single(priv->dev, skb->data, skb->len, DMA_TO_DEVICE); + if (dma_mapping_error(priv->dev, phys)) { dev_kfree_skb(skb); return NETDEV_TX_OK; } @@ -506,7 +507,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) if (unlikely(!skb)) net_dbg_ratelimited("build_skb failed\n");
- dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head], + dma_unmap_single(priv->dev, priv->rx_phys[priv->rx_head], RX_BUF_SIZE, DMA_FROM_DEVICE); priv->rx_phys[priv->rx_head] = 0;
@@ -534,9 +535,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) buf = netdev_alloc_frag(priv->rx_buf_size); if (!buf) goto done; - phys = dma_map_single(&ndev->dev, buf, + phys = dma_map_single(priv->dev, buf, RX_BUF_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(&ndev->dev, phys)) + if (dma_mapping_error(priv->dev, phys)) goto done; priv->rx_buf[priv->rx_head] = buf; priv->rx_phys[priv->rx_head] = phys; @@ -639,9 +640,9 @@ static int hip04_mac_open(struct net_device *ndev) for (i = 0; i < RX_DESC_NUM; i++) { dma_addr_t phys;
- phys = dma_map_single(&ndev->dev, priv->rx_buf[i], + phys = dma_map_single(priv->dev, priv->rx_buf[i], RX_BUF_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(&ndev->dev, phys)) + if (dma_mapping_error(priv->dev, phys)) return -EIO;
priv->rx_phys[i] = phys; @@ -675,7 +676,7 @@ static int hip04_mac_stop(struct net_device *ndev)
for (i = 0; i < RX_DESC_NUM; i++) { if (priv->rx_phys[i]) { - dma_unmap_single(&ndev->dev, priv->rx_phys[i], + dma_unmap_single(priv->dev, priv->rx_phys[i], RX_BUF_SIZE, DMA_FROM_DEVICE); priv->rx_phys[i] = 0; } @@ -826,6 +827,7 @@ static int hip04_mac_probe(struct platform_device *pdev) return -ENOMEM;
priv = netdev_priv(ndev); + priv->dev = d; priv->ndev = ndev; platform_set_drvdata(pdev, ndev);
[ Upstream commit 752ead44491e8c91e14d7079625c5916b30921c5 ]
Abort processing of a command if we run out of mapped data in the SG list. This should never happen, but a previous bug caused it to be possible. Play it safe and attempt to abort nicely if we don't have more SG segments left.
Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/libata-sff.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 18de4c4570682..1d8901fc0bfa9 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -703,6 +703,10 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) unsigned int offset; unsigned char *buf;
+ if (!qc->cursg) { + qc->curbytes = qc->nbytes; + return; + } if (qc->curbytes == qc->nbytes - qc->sect_size) ap->hsm_task_state = HSM_ST_LAST;
@@ -742,6 +746,8 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
if (qc->cursg_ofs == qc->cursg->length) { qc->cursg = sg_next(qc->cursg); + if (!qc->cursg) + ap->hsm_task_state = HSM_ST_LAST; qc->cursg_ofs = 0; } }
[ Upstream commit c096397c78f766db972f923433031f2dec01cae0 ]
selftests kvm test cases need pre-required kernel configs for the test to get pass.
Signed-off-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/config | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 tools/testing/selftests/kvm/config
diff --git a/tools/testing/selftests/kvm/config b/tools/testing/selftests/kvm/config new file mode 100644 index 0000000000000..63ed533f73d6e --- /dev/null +++ b/tools/testing/selftests/kvm/config @@ -0,0 +1,3 @@ +CONFIG_KVM=y +CONFIG_KVM_INTEL=y +CONFIG_KVM_AMD=y
From: Aaron Armstrong Skomra skomra@gmail.com
commit fcf887e7caaa813eea821d11bf2b7619a37df37a upstream.
The EKR ring claims a range of 0 to 71 but actually reports values 1 to 72. The ring is used in relative mode so this change should not affect users.
Signed-off-by: Aaron Armstrong Skomra aaron.skomra@wacom.com Fixes: 72b236d60218f ("HID: wacom: Add support for Express Key Remote.") Cc: stable@vger.kernel.org # v4.3+ Reviewed-by: Ping Cheng ping.cheng@wacom.com Reviewed-by: Jason Gerecke jason.gerecke@wacom.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/wacom_wac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -674,7 +674,7 @@ static int wacom_remote_irq(struct wacom input_report_key(input, BTN_BASE2, (data[11] & 0x02));
if (data[12] & 0x80) - input_report_abs(input, ABS_WHEEL, (data[12] & 0x7f)); + input_report_abs(input, ABS_WHEEL, (data[12] & 0x7f) - 1); else input_report_abs(input, ABS_WHEEL, 0);
From: Mikulas Patocka mpatocka@redhat.com
commit cf3591ef832915892f2499b7e54b51d4c578b28c upstream.
Revert the commit bd293d071ffe65e645b4d8104f9d8fe15ea13862. The proper fix has been made available with commit d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread").
Note that the fix offered by commit bd293d071ffe doesn't really prevent the deadlock from occuring - if we look at the stacktrace reported by Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex - i.e. it has already successfully taken the mutex. Changing the mutex from mutex_lock to mutex_trylock won't help with deadlocks that happen afterwards.
PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Fixes: bd293d071ffe ("dm bufio: fix deadlock with loop device") Depends-on: d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread") Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-bufio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -1561,7 +1561,9 @@ dm_bufio_shrink_scan(struct shrinker *sh unsigned long freed;
c = container_of(shrink, struct dm_bufio_client, shrinker); - if (!dm_bufio_trylock(c)) + if (sc->gfp_mask & __GFP_FS) + dm_bufio_lock(c); + else if (!dm_bufio_trylock(c)) return SHRINK_STOP;
freed = __scan(c, sc->nr_to_scan, sc->gfp_mask);
From: Oleg Nesterov oleg@redhat.com
commit 46d0b24c5ee10a15dfb25e20642f5a5ed59c5003 upstream.
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if mm->core_state != NULL.
Otherwise a page fault can see userfaultfd_missing() == T and use an already freed userfaultfd_ctx.
Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping") Signed-off-by: Oleg Nesterov oleg@redhat.com Reported-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Andrea Arcangeli aarcange@redhat.com Tested-by: Kefeng Wang wangkefeng.wang@huawei.com Cc: Peter Xu peterx@redhat.com Cc: Mike Rapoport rppt@linux.ibm.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@mellanox.com Cc: Michal Hocko mhocko@suse.com Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/userfaultfd.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
--- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -431,6 +431,7 @@ static int userfaultfd_release(struct in /* len == 0 means wake all */ struct userfaultfd_wake_range range = { .len = 0, }; unsigned long new_flags; + bool still_valid;
ACCESS_ONCE(ctx->released) = true;
@@ -446,8 +447,7 @@ static int userfaultfd_release(struct in * taking the mmap_sem for writing. */ down_write(&mm->mmap_sem); - if (!mmget_still_valid(mm)) - goto skip_mm; + still_valid = mmget_still_valid(mm); prev = NULL; for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); @@ -458,19 +458,20 @@ static int userfaultfd_release(struct in continue; } new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP); - prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end, - new_flags, vma->anon_vma, - vma->vm_file, vma->vm_pgoff, - vma_policy(vma), - NULL_VM_UFFD_CTX); - if (prev) - vma = prev; - else - prev = vma; + if (still_valid) { + prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end, + new_flags, vma->anon_vma, + vma->vm_file, vma->vm_pgoff, + vma_policy(vma), + NULL_VM_UFFD_CTX); + if (prev) + vma = prev; + else + prev = vma; + } vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; } -skip_mm: up_write(&mm->mmap_sem); mmput(mm); wakeup:
From: Sean Christopherson sean.j.christopherson@intel.com
commit b63f20a778c88b6a04458ed6ffc69da953d3a109 upstream.
Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to avoid clobbering flags.
KVM's emulator makes indirect calls into a jump table of sorts, where the destination of the CALL_NOSPEC is a small blob of code that performs fast emulation by executing the target instruction with fixed operands.
adcb_al_dl: 0x000339f8 <+0>: adc %dl,%al 0x000339fa <+2>: ret
A major motiviation for doing fast emulation is to leverage the CPU to handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is both an input and output to the target of CALL_NOSPEC. Clobbering flags results in all sorts of incorrect emulation, e.g. Jcc instructions often take the wrong path. Sans the nops...
asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n" 0x0003595a <+58>: mov 0xc0(%ebx),%eax 0x00035960 <+64>: mov 0x60(%ebx),%edx 0x00035963 <+67>: mov 0x90(%ebx),%ecx 0x00035969 <+73>: push %edi 0x0003596a <+74>: popf 0x0003596b <+75>: call *%esi 0x000359a0 <+128>: pushf 0x000359a1 <+129>: pop %edi 0x000359a2 <+130>: mov %eax,0xc0(%ebx) 0x000359b1 <+145>: mov %edx,0x60(%ebx)
ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK); 0x000359a8 <+136>: mov -0x10(%ebp),%eax 0x000359ab <+139>: and $0x8d5,%edi 0x000359b4 <+148>: and $0xfffff72a,%eax 0x000359b9 <+153>: or %eax,%edi 0x000359bd <+157>: mov %edi,0x4(%ebx)
For the most part this has gone unnoticed as emulation of guest code that can trigger fast emulation is effectively limited to MMIO when running on modern hardware, and MMIO is rarely, if ever, accessed by instructions that affect or consume flags.
Breakage is almost instantaneous when running with unrestricted guest disabled, in which case KVM must emulate all instructions when the guest has invalid state, e.g. when the guest is in Big Real Mode during early BIOS.
Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support") Fixes: 1a29b5b7f347a ("KVM: x86: Make indirect calls in emulator speculation safe") Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/nospec-branch.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -151,7 +151,7 @@ " lfence;\n" \ " jmp 902b;\n" \ " .align 16\n" \ - "903: addl $4, %%esp;\n" \ + "903: lea 4(%%esp), %%esp;\n" \ " pushl %[thunk_target];\n" \ " ret;\n" \ " .align 16\n" \
From: Thomas Gleixner tglx@linutronix.de
commit f897e60a12f0b9146357780d317879bce2a877dc upstream.
Some newer machines do not advertise legacy timers. The kernel can handle that situation if the TSC and the CPU frequency are enumerated by CPUID or MSRs and the CPU supports TSC deadline timer. If the CPU does not support TSC deadline timer the local APIC timer frequency has to be known as well.
Some Ryzens machines do not advertize legacy timers, but there is no reliable way to determine the bus frequency which feeds the local APIC timer when the machine allows overclocking of that frequency.
As there is no legacy timer the local APIC timer calibration crashes due to a NULL pointer dereference when accessing the not installed global clock event device.
Switch the calibration loop to a non interrupt based one, which polls either TSC (if frequency is known) or jiffies. The latter requires a global clockevent. As the machines which do not have a global clockevent installed have a known TSC frequency this is a non issue. For older machines where TSC frequency is not known, there is no known case where the legacy timers do not exist as that would have been reported long ago.
Reported-by: Daniel Drake drake@endlessm.com Reported-by: Jiri Slaby jslaby@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Daniel Drake drake@endlessm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linu... Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/apic/apic.c | 68 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 53 insertions(+), 15 deletions(-)
--- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -593,7 +593,7 @@ static __initdata unsigned long lapic_ca static __initdata unsigned long lapic_cal_j1, lapic_cal_j2;
/* - * Temporary interrupt handler. + * Temporary interrupt handler and polled calibration function. */ static void __init lapic_cal_handler(struct clock_event_device *dev) { @@ -677,7 +677,8 @@ calibrate_by_pmtimer(long deltapm, long static int __init calibrate_APIC_clock(void) { struct clock_event_device *levt = this_cpu_ptr(&lapic_events); - void (*real_handler)(struct clock_event_device *dev); + u64 tsc_perj = 0, tsc_start = 0; + unsigned long jif_start; unsigned long deltaj; long delta, deltatsc; int pm_referenced = 0; @@ -706,29 +707,65 @@ static int __init calibrate_APIC_clock(v apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n" "calibrating APIC timer ...\n");
+ /* + * There are platforms w/o global clockevent devices. Instead of + * making the calibration conditional on that, use a polling based + * approach everywhere. + */ local_irq_disable();
- /* Replace the global interrupt handler */ - real_handler = global_clock_event->event_handler; - global_clock_event->event_handler = lapic_cal_handler; - /* * Setup the APIC counter to maximum. There is no way the lapic * can underflow in the 100ms detection time frame */ __setup_APIC_LVTT(0xffffffff, 0, 0);
- /* Let the interrupts run */ + /* + * Methods to terminate the calibration loop: + * 1) Global clockevent if available (jiffies) + * 2) TSC if available and frequency is known + */ + jif_start = READ_ONCE(jiffies); + + if (tsc_khz) { + tsc_start = rdtsc(); + tsc_perj = div_u64((u64)tsc_khz * 1000, HZ); + } + + /* + * Enable interrupts so the tick can fire, if a global + * clockevent device is available + */ local_irq_enable();
- while (lapic_cal_loops <= LAPIC_CAL_LOOPS) - cpu_relax(); + while (lapic_cal_loops <= LAPIC_CAL_LOOPS) { + /* Wait for a tick to elapse */ + while (1) { + if (tsc_khz) { + u64 tsc_now = rdtsc(); + if ((tsc_now - tsc_start) >= tsc_perj) { + tsc_start += tsc_perj; + break; + } + } else { + unsigned long jif_now = READ_ONCE(jiffies); + + if (time_after(jif_now, jif_start)) { + jif_start = jif_now; + break; + } + } + cpu_relax(); + } + + /* Invoke the calibration routine */ + local_irq_disable(); + lapic_cal_handler(NULL); + local_irq_enable(); + }
local_irq_disable();
- /* Restore the real event handler */ - global_clock_event->event_handler = real_handler; - /* Build delta t1-t2 as apic timer counts down */ delta = lapic_cal_t1 - lapic_cal_t2; apic_printk(APIC_VERBOSE, "... lapic delta = %ld\n", delta); @@ -778,10 +815,11 @@ static int __init calibrate_APIC_clock(v levt->features &= ~CLOCK_EVT_FEAT_DUMMY;
/* - * PM timer calibration failed or not turned on - * so lets try APIC timer based calibration + * PM timer calibration failed or not turned on so lets try APIC + * timer based calibration, if a global clockevent device is + * available. */ - if (!pm_referenced) { + if (!pm_referenced && global_clock_event) { apic_printk(APIC_VERBOSE, "... verify APIC timer\n");
/*
From: John Hubbard jhubbard@nvidia.com
commit a90118c445cc7f07781de26a9684d4ec58bfcfd1 upstream.
Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds memset, if the memset goes accross several fields of a struct. This generated a couple of warnings on x86_64 builds in sanitize_boot_params().
Fix this by explicitly saving the fields in struct boot_params that are intended to be preserved, and zeroing all the rest.
[ tglx: Tagged for stable as it breaks the warning free build there as well ]
Suggested-by: Thomas Gleixner tglx@linutronix.de Suggested-by: H. Peter Anvin hpa@zytor.com Signed-off-by: John Hubbard jhubbard@nvidia.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bootparam_utils.h | 59 +++++++++++++++++++++++++-------- 1 file changed, 46 insertions(+), 13 deletions(-)
--- a/arch/x86/include/asm/bootparam_utils.h +++ b/arch/x86/include/asm/bootparam_utils.h @@ -17,6 +17,20 @@ * Note: efi_info is commonly left uninitialized, but that field has a * private magic, so it is better to leave it unchanged. */ + +#define sizeof_mbr(type, member) ({ sizeof(((type *)0)->member); }) + +#define BOOT_PARAM_PRESERVE(struct_member) \ + { \ + .start = offsetof(struct boot_params, struct_member), \ + .len = sizeof_mbr(struct boot_params, struct_member), \ + } + +struct boot_params_to_save { + unsigned int start; + unsigned int len; +}; + static void sanitize_boot_params(struct boot_params *boot_params) { /* @@ -35,19 +49,38 @@ static void sanitize_boot_params(struct */ if (boot_params->sentinel) { /* fields in boot_params are left uninitialized, clear them */ - memset(&boot_params->ext_ramdisk_image, 0, - (char *)&boot_params->efi_info - - (char *)&boot_params->ext_ramdisk_image); - memset(&boot_params->kbd_status, 0, - (char *)&boot_params->hdr - - (char *)&boot_params->kbd_status); - memset(&boot_params->_pad7[0], 0, - (char *)&boot_params->edd_mbr_sig_buffer[0] - - (char *)&boot_params->_pad7[0]); - memset(&boot_params->_pad8[0], 0, - (char *)&boot_params->eddbuf[0] - - (char *)&boot_params->_pad8[0]); - memset(&boot_params->_pad9[0], 0, sizeof(boot_params->_pad9)); + static struct boot_params scratch; + char *bp_base = (char *)boot_params; + char *save_base = (char *)&scratch; + int i; + + const struct boot_params_to_save to_save[] = { + BOOT_PARAM_PRESERVE(screen_info), + BOOT_PARAM_PRESERVE(apm_bios_info), + BOOT_PARAM_PRESERVE(tboot_addr), + BOOT_PARAM_PRESERVE(ist_info), + BOOT_PARAM_PRESERVE(hd0_info), + BOOT_PARAM_PRESERVE(hd1_info), + BOOT_PARAM_PRESERVE(sys_desc_table), + BOOT_PARAM_PRESERVE(olpc_ofw_header), + BOOT_PARAM_PRESERVE(efi_info), + BOOT_PARAM_PRESERVE(alt_mem_k), + BOOT_PARAM_PRESERVE(scratch), + BOOT_PARAM_PRESERVE(e820_entries), + BOOT_PARAM_PRESERVE(eddbuf_entries), + BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries), + BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer), + BOOT_PARAM_PRESERVE(eddbuf), + }; + + memset(&scratch, 0, sizeof(scratch)); + + for (i = 0; i < ARRAY_SIZE(to_save); i++) { + memcpy(save_base + to_save[i].start, + bp_base + to_save[i].start, to_save[i].len); + } + + memcpy(boot_params, save_base, sizeof(*boot_params)); } }
From: John Hubbard jhubbard@nvidia.com
commit 7846f58fba964af7cb8cf77d4d13c33254725211 upstream.
commit a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") had two errors:
* It preserved boot_params.acpi_rsdp_addr, and * It failed to preserve boot_params.hdr
Therefore, zero out acpi_rsdp_addr, and preserve hdr.
Fixes: a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") Reported-by: Neil MacLeod neil@nmacleod.com Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: John Hubbard jhubbard@nvidia.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Neil MacLeod neil@nmacleod.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190821192513.20126-1-jhubbard@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bootparam_utils.h | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/include/asm/bootparam_utils.h +++ b/arch/x86/include/asm/bootparam_utils.h @@ -70,6 +70,7 @@ static void sanitize_boot_params(struct BOOT_PARAM_PRESERVE(eddbuf_entries), BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries), BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer), + BOOT_PARAM_PRESERVE(hdr), BOOT_PARAM_PRESERVE(eddbuf), };
From: ZhangXiaoxu zhangxiaoxu5@huawei.com
commit e4f9d6013820d1eba1432d51dd1c5795759aa77f upstream.
When btree_split_beneath() splits a node to two new children, it will allocate two blocks: left and right. If right block's allocation failed, the left block will be unlocked and marked dirty. If this happened, the left block'ss content is zero, because it wasn't initialized with the btree struct before the attempot to allocate the right block. Upon return, when flushing the left block to disk, the validator will fail when check this block. Then a BUG_ON is raised.
Fix this by completely initializing the left block before allocating and initializing the right block.
Fixes: 4dcb8b57df359 ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path") Cc: stable@vger.kernel.org Signed-off-by: ZhangXiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/persistent-data/dm-btree.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-)
--- a/drivers/md/persistent-data/dm-btree.c +++ b/drivers/md/persistent-data/dm-btree.c @@ -616,39 +616,40 @@ static int btree_split_beneath(struct sh
new_parent = shadow_current(s);
+ pn = dm_block_data(new_parent); + size = le32_to_cpu(pn->header.flags) & INTERNAL_NODE ? + sizeof(__le64) : s->info->value_type.size; + + /* create & init the left block */ r = new_block(s->info, &left); if (r < 0) return r;
+ ln = dm_block_data(left); + nr_left = le32_to_cpu(pn->header.nr_entries) / 2; + + ln->header.flags = pn->header.flags; + ln->header.nr_entries = cpu_to_le32(nr_left); + ln->header.max_entries = pn->header.max_entries; + ln->header.value_size = pn->header.value_size; + memcpy(ln->keys, pn->keys, nr_left * sizeof(pn->keys[0])); + memcpy(value_ptr(ln, 0), value_ptr(pn, 0), nr_left * size); + + /* create & init the right block */ r = new_block(s->info, &right); if (r < 0) { unlock_block(s->info, left); return r; }
- pn = dm_block_data(new_parent); - ln = dm_block_data(left); rn = dm_block_data(right); - - nr_left = le32_to_cpu(pn->header.nr_entries) / 2; nr_right = le32_to_cpu(pn->header.nr_entries) - nr_left;
- ln->header.flags = pn->header.flags; - ln->header.nr_entries = cpu_to_le32(nr_left); - ln->header.max_entries = pn->header.max_entries; - ln->header.value_size = pn->header.value_size; - rn->header.flags = pn->header.flags; rn->header.nr_entries = cpu_to_le32(nr_right); rn->header.max_entries = pn->header.max_entries; rn->header.value_size = pn->header.value_size; - - memcpy(ln->keys, pn->keys, nr_left * sizeof(pn->keys[0])); memcpy(rn->keys, pn->keys + nr_left, nr_right * sizeof(pn->keys[0])); - - size = le32_to_cpu(pn->header.flags) & INTERNAL_NODE ? - sizeof(__le64) : s->info->value_type.size; - memcpy(value_ptr(ln, 0), value_ptr(pn, 0), nr_left * size); memcpy(value_ptr(rn, 0), value_ptr(pn, nr_left), nr_right * size);
From: ZhangXiaoxu zhangxiaoxu5@huawei.com
commit ae148243d3f0816b37477106c05a2ec7d5f32614 upstream.
In commit 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize"), we refactor the commit logic to a new function 'apply_bops'. But when that logic was replaced in out() the return value was not stored. This may lead out() returning a wrong value to the caller.
Fixes: 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize") Cc: stable@vger.kernel.org Signed-off-by: ZhangXiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/persistent-data/dm-space-map-metadata.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/persistent-data/dm-space-map-metadata.c +++ b/drivers/md/persistent-data/dm-space-map-metadata.c @@ -248,7 +248,7 @@ static int out(struct sm_metadata *smm) }
if (smm->recursion_count == 1) - apply_bops(smm); + r = apply_bops(smm);
smm->recursion_count--;
From: Mikulas Patocka mpatocka@redhat.com
commit 1cfd5d3399e87167b7f9157ef99daa0e959f395d upstream.
If the sector number is too high, dm_table_find_target() should return a pointer to a zeroed dm_target structure (the caller should test it with dm_target_is_valid).
However, for some table sizes, the code in dm_table_find_target() that performs btree lookup will access out of bound memory structures.
Fix this bug by testing the sector number at the beginning of dm_table_find_target(). Also, add an "inline" keyword to the function dm_table_get_size() because this is a hot path.
Fixes: 512875bd9661 ("dm: table detect io beyond device") Cc: stable@vger.kernel.org Reported-by: Zhang Tao kontais@zoho.com Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-table.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1167,7 +1167,7 @@ void dm_table_event(struct dm_table *t) } EXPORT_SYMBOL(dm_table_event);
-sector_t dm_table_get_size(struct dm_table *t) +inline sector_t dm_table_get_size(struct dm_table *t) { return t->num_targets ? (t->highs[t->num_targets - 1] + 1) : 0; } @@ -1192,6 +1192,9 @@ struct dm_target *dm_table_find_target(s unsigned int l, n = 0, k = 0; sector_t *node;
+ if (unlikely(sector >= dm_table_get_size(t))) + return &t->targets[t->num_targets]; + for (l = 0; l < t->depth; l++) { n = get_child(n, k); node = get_node(t, l, n);
From: Daniel Bristot de Oliveira bristot@redhat.com
commit 82d6489d0fed2ec8a8c48c19e8d8a04ac8e5bb26 upstream.
While testing the deadline scheduler + cgroup setup I hit this warning.
[ 132.612935] ------------[ cut here ]------------ [ 132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80 [ 132.612952] Modules linked in: (a ton of modules...) [ 132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2 [ 132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 132.612982] 0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e [ 132.612984] 0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b [ 132.612985] 00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00 [ 132.612986] Call Trace: [ 132.612987] <IRQ> [<ffffffff813d229e>] dump_stack+0x63/0x85 [ 132.612994] [<ffffffff810a652b>] __warn+0xcb/0xf0 [ 132.612997] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.612999] [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20 [ 132.613000] [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80 [ 132.613008] [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20 [ 132.613010] [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10 [ 132.613015] [<ffffffff811388ac>] put_css_set+0x5c/0x60 [ 132.613016] [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0 [ 132.613017] [<ffffffff810a3912>] __put_task_struct+0x42/0x140 [ 132.613018] [<ffffffff810e776a>] dl_task_timer+0xca/0x250 [ 132.613027] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.613030] [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270 [ 132.613031] [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190 [ 132.613034] [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60 [ 132.613035] [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50 [ 132.613037] [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0 [ 132.613038] <EOI> [<ffffffff81063466>] ? native_safe_halt+0x6/0x10 [ 132.613043] [<ffffffff81037a4e>] default_idle+0x1e/0xd0 [ 132.613044] [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20 [ 132.613046] [<ffffffff810e8fda>] default_idle_call+0x2a/0x40 [ 132.613047] [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340 [ 132.613048] [<ffffffff81050235>] start_secondary+0x155/0x190 [ 132.613049] ---[ end trace f91934d162ce9977 ]---
The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid this problem - and other problems of sharing a spinlock with an interrupt.
Cc: Tejun Heo tj@kernel.org Cc: Li Zefan lizefan@huawei.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Juri Lelli juri.lelli@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: cgroups@vger.kernel.org Cc: stable@vger.kernel.org # 4.5+ Cc: linux-kernel@vger.kernel.org Reviewed-by: Rik van Riel riel@redhat.com Reviewed-by: "Luis Claudio R. Goncalves" lgoncalv@redhat.com Signed-off-by: Daniel Bristot de Oliveira bristot@redhat.com Acked-by: Zefan Li lizefan@huawei.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Zubin Mithra zsm@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/cgroup.c | 122 +++++++++++++++++++++++++++++--------------------------- 1 file changed, 64 insertions(+), 58 deletions(-)
--- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -784,6 +784,8 @@ static void put_css_set_locked(struct cs
static void put_css_set(struct css_set *cset) { + unsigned long flags; + /* * Ensure that the refcount doesn't hit zero while any readers * can see it. Similar to atomic_dec_and_lock(), but for an @@ -792,9 +794,9 @@ static void put_css_set(struct css_set * if (atomic_add_unless(&cset->refcount, -1, 1)) return;
- spin_lock_bh(&css_set_lock); + spin_lock_irqsave(&css_set_lock, flags); put_css_set_locked(cset); - spin_unlock_bh(&css_set_lock); + spin_unlock_irqrestore(&css_set_lock, flags); }
/* @@ -1017,11 +1019,11 @@ static struct css_set *find_css_set(stru
/* First see if we already have a cgroup group that matches * the desired set */ - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); cset = find_existing_css_set(old_cset, cgrp, template); if (cset) get_css_set(cset); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
if (cset) return cset; @@ -1049,7 +1051,7 @@ static struct css_set *find_css_set(stru * find_existing_css_set() */ memcpy(cset->subsys, template, sizeof(cset->subsys));
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); /* Add reference counts and links from the new css_set. */ list_for_each_entry(link, &old_cset->cgrp_links, cgrp_link) { struct cgroup *c = link->cgrp; @@ -1075,7 +1077,7 @@ static struct css_set *find_css_set(stru css_get(css); }
- spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
return cset; } @@ -1139,7 +1141,7 @@ static void cgroup_destroy_root(struct c * Release all the links from cset_links to this hierarchy's * root cgroup */ - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock);
list_for_each_entry_safe(link, tmp_link, &cgrp->cset_links, cset_link) { list_del(&link->cset_link); @@ -1147,7 +1149,7 @@ static void cgroup_destroy_root(struct c kfree(link); }
- spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
if (!list_empty(&root->root_list)) { list_del(&root->root_list); @@ -1551,11 +1553,11 @@ static int rebind_subsystems(struct cgro ss->root = dst_root; css->cgroup = dcgrp;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); hash_for_each(css_set_table, i, cset, hlist) list_move_tail(&cset->e_cset_node[ss->id], &dcgrp->e_csets[ss->id]); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
src_root->subsys_mask &= ~(1 << ssid); scgrp->subtree_control &= ~(1 << ssid); @@ -1832,7 +1834,7 @@ static void cgroup_enable_task_cg_lists( { struct task_struct *p, *g;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock);
if (use_task_css_set_links) goto out_unlock; @@ -1857,8 +1859,12 @@ static void cgroup_enable_task_cg_lists( * entry won't be deleted though the process has exited. * Do it while holding siglock so that we don't end up * racing against cgroup_exit(). + * + * Interrupts were already disabled while acquiring + * the css_set_lock, so we do not need to disable it + * again when acquiring the sighand->siglock here. */ - spin_lock_irq(&p->sighand->siglock); + spin_lock(&p->sighand->siglock); if (!(p->flags & PF_EXITING)) { struct css_set *cset = task_css_set(p);
@@ -1867,11 +1873,11 @@ static void cgroup_enable_task_cg_lists( list_add_tail(&p->cg_list, &cset->tasks); get_css_set(cset); } - spin_unlock_irq(&p->sighand->siglock); + spin_unlock(&p->sighand->siglock); } while_each_thread(g, p); read_unlock(&tasklist_lock); out_unlock: - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); }
static void init_cgroup_housekeeping(struct cgroup *cgrp) @@ -1976,13 +1982,13 @@ static int cgroup_setup_root(struct cgro * Link the root cgroup in this hierarchy into all the css_set * objects. */ - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); hash_for_each(css_set_table, i, cset, hlist) { link_css_set(&tmp_links, cset, root_cgrp); if (css_set_populated(cset)) cgroup_update_populated(root_cgrp, true); } - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
BUG_ON(!list_empty(&root_cgrp->self.children)); BUG_ON(atomic_read(&root->nr_cgrps) != 1); @@ -2215,7 +2221,7 @@ char *task_cgroup_path(struct task_struc char *path = NULL;
mutex_lock(&cgroup_mutex); - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock);
root = idr_get_next(&cgroup_hierarchy_idr, &hierarchy_id);
@@ -2228,7 +2234,7 @@ char *task_cgroup_path(struct task_struc path = buf; }
- spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); mutex_unlock(&cgroup_mutex); return path; } @@ -2403,7 +2409,7 @@ static int cgroup_taskset_migrate(struct * the new cgroup. There are no failure cases after here, so this * is the commit point. */ - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_for_each_entry(cset, &tset->src_csets, mg_node) { list_for_each_entry_safe(task, tmp_task, &cset->mg_tasks, cg_list) { struct css_set *from_cset = task_css_set(task); @@ -2414,7 +2420,7 @@ static int cgroup_taskset_migrate(struct put_css_set_locked(from_cset); } } - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
/* * Migration is committed, all target tasks are now on dst_csets. @@ -2443,13 +2449,13 @@ out_cancel_attach: } } out_release_tset: - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_splice_init(&tset->dst_csets, &tset->src_csets); list_for_each_entry_safe(cset, tmp_cset, &tset->src_csets, mg_node) { list_splice_tail_init(&cset->mg_tasks, &cset->tasks); list_del_init(&cset->mg_node); } - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); return ret; }
@@ -2466,14 +2472,14 @@ static void cgroup_migrate_finish(struct
lockdep_assert_held(&cgroup_mutex);
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_for_each_entry_safe(cset, tmp_cset, preloaded_csets, mg_preload_node) { cset->mg_src_cgrp = NULL; cset->mg_dst_cset = NULL; list_del_init(&cset->mg_preload_node); put_css_set_locked(cset); } - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); }
/** @@ -2623,7 +2629,7 @@ static int cgroup_migrate(struct task_st * already PF_EXITING could be freed from underneath us unless we * take an rcu_read_lock. */ - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); rcu_read_lock(); task = leader; do { @@ -2632,7 +2638,7 @@ static int cgroup_migrate(struct task_st break; } while_each_thread(leader, task); rcu_read_unlock(); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
return cgroup_taskset_migrate(&tset, cgrp); } @@ -2653,7 +2659,7 @@ static int cgroup_attach_task(struct cgr int ret;
/* look up all src csets */ - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); rcu_read_lock(); task = leader; do { @@ -2663,7 +2669,7 @@ static int cgroup_attach_task(struct cgr break; } while_each_thread(leader, task); rcu_read_unlock(); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
/* prepare dst csets and commit */ ret = cgroup_migrate_prepare_dst(dst_cgrp, &preloaded_csets); @@ -2696,9 +2702,9 @@ static int cgroup_procs_write_permission struct cgroup *cgrp; struct inode *inode;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); cgrp = task_cgroup_from_root(task, &cgrp_dfl_root); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
while (!cgroup_is_descendant(dst_cgrp, cgrp)) cgrp = cgroup_parent(cgrp); @@ -2800,9 +2806,9 @@ int cgroup_attach_task_all(struct task_s if (root == &cgrp_dfl_root) continue;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); from_cgrp = task_cgroup_from_root(from, root); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
retval = cgroup_attach_task(from_cgrp, tsk, false); if (retval) @@ -2927,7 +2933,7 @@ static int cgroup_update_dfl_csses(struc percpu_down_write(&cgroup_threadgroup_rwsem);
/* look up all csses currently attached to @cgrp's subtree */ - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) { struct cgrp_cset_link *link;
@@ -2939,14 +2945,14 @@ static int cgroup_update_dfl_csses(struc cgroup_migrate_add_src(link->cset, cgrp, &preloaded_csets); } - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
/* NULL dst indicates self on default hierarchy */ ret = cgroup_migrate_prepare_dst(NULL, &preloaded_csets); if (ret) goto out_finish;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) { struct task_struct *task, *ntask;
@@ -2958,7 +2964,7 @@ static int cgroup_update_dfl_csses(struc list_for_each_entry_safe(task, ntask, &src_cset->tasks, cg_list) cgroup_taskset_add(task, &tset); } - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
ret = cgroup_taskset_migrate(&tset, cgrp); out_finish: @@ -3641,10 +3647,10 @@ static int cgroup_task_count(const struc int count = 0; struct cgrp_cset_link *link;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_for_each_entry(link, &cgrp->cset_links, cset_link) count += atomic_read(&link->cset->refcount); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); return count; }
@@ -3982,7 +3988,7 @@ void css_task_iter_start(struct cgroup_s
memset(it, 0, sizeof(*it));
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock);
it->ss = css->ss;
@@ -3995,7 +4001,7 @@ void css_task_iter_start(struct cgroup_s
css_task_iter_advance_css_set(it);
- spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); }
/** @@ -4013,7 +4019,7 @@ struct task_struct *css_task_iter_next(s it->cur_task = NULL; }
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock);
if (it->task_pos) { it->cur_task = list_entry(it->task_pos, struct task_struct, @@ -4022,7 +4028,7 @@ struct task_struct *css_task_iter_next(s css_task_iter_advance(it); }
- spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
return it->cur_task; } @@ -4036,10 +4042,10 @@ struct task_struct *css_task_iter_next(s void css_task_iter_end(struct css_task_iter *it) { if (it->cur_cset) { - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_del(&it->iters_node); put_css_set_locked(it->cur_cset); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); }
if (it->cur_task) @@ -4068,10 +4074,10 @@ int cgroup_transfer_tasks(struct cgroup mutex_lock(&cgroup_mutex);
/* all tasks in @from are being moved, all csets are source */ - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_for_each_entry(link, &from->cset_links, cset_link) cgroup_migrate_add_src(link->cset, to, &preloaded_csets); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
ret = cgroup_migrate_prepare_dst(to, &preloaded_csets); if (ret) @@ -5180,10 +5186,10 @@ static int cgroup_destroy_locked(struct */ cgrp->self.flags &= ~CSS_ONLINE;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_for_each_entry(link, &cgrp->cset_links, cset_link) link->cset->dead = true; - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock);
/* initiate massacre of all css's */ for_each_css(css, ssid, cgrp) @@ -5436,7 +5442,7 @@ int proc_cgroup_show(struct seq_file *m, goto out;
mutex_lock(&cgroup_mutex); - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock);
for_each_root(root) { struct cgroup_subsys *ss; @@ -5488,7 +5494,7 @@ int proc_cgroup_show(struct seq_file *m,
retval = 0; out_unlock: - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); mutex_unlock(&cgroup_mutex); kfree(buf); out: @@ -5649,13 +5655,13 @@ void cgroup_post_fork(struct task_struct if (use_task_css_set_links) { struct css_set *cset;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); cset = task_css_set(current); if (list_empty(&child->cg_list)) { get_css_set(cset); css_set_move_task(child, NULL, cset, false); } - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); }
/* @@ -5699,9 +5705,9 @@ void cgroup_exit(struct task_struct *tsk cset = task_css_set(tsk);
if (!list_empty(&tsk->cg_list)) { - spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); css_set_move_task(tsk, cset, NULL, false); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); } else { get_css_set(cset); } @@ -5914,7 +5920,7 @@ static int current_css_set_cg_links_read if (!name_buf) return -ENOMEM;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); rcu_read_lock(); cset = rcu_dereference(current->cgroups); list_for_each_entry(link, &cset->cgrp_links, cgrp_link) { @@ -5925,7 +5931,7 @@ static int current_css_set_cg_links_read c->root->hierarchy_id, name_buf); } rcu_read_unlock(); - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); kfree(name_buf); return 0; } @@ -5936,7 +5942,7 @@ static int cgroup_css_links_read(struct struct cgroup_subsys_state *css = seq_css(seq); struct cgrp_cset_link *link;
- spin_lock_bh(&css_set_lock); + spin_lock_irq(&css_set_lock); list_for_each_entry(link, &css->cgroup->cset_links, cset_link) { struct css_set *cset = link->cset; struct task_struct *task; @@ -5959,7 +5965,7 @@ static int cgroup_css_links_read(struct overflow: seq_puts(seq, " ...\n"); } - spin_unlock_bh(&css_set_lock); + spin_unlock_irq(&css_set_lock); return 0; }
commit 36e4ad0316c017d5b271378ed9a1c9a4b77fab5f upstream.
Before this patch, function read_rindex_entry would set a rgrp glock's gl_object pointer to itself before inserting the rgrp into the rgrp rbtree. The problem is: if another process was also reading the rgrp in, and had already inserted its newly created rgrp, then the second call to read_rindex_entry would overwrite that value, then return a bad return code to the caller. Later, other functions would reference the now-freed rgrp memory by way of gl_object. In some cases, that could result in gfs2_rgrp_brelse being called twice for the same rgrp: once for the failed attempt and once for the "real" rgrp release. Eventually the kernel would panic. There are also a number of other things that could go wrong when a kernel module is accessing freed storage. For example, this could result in rgrp corruption because the fake rgrp would point to a fake bitmap in memory too, causing gfs2_inplace_reserve to search some random memory for free blocks, and find some, since we were never setting rgd->rd_bits to NULL before freeing it.
This patch fixes the problem by not setting gl_object until we have successfully inserted the rgrp into the rbtree. Also, it sets rd_bits to NULL as it frees them, which will ensure any accidental access to the wrong rgrp will result in a kernel panic rather than file system corruption, which is preferred.
Signed-off-by: Bob Peterson rpeterso@redhat.com [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/rgrp.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index ef24894edecc1..9c159e6ad1164 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -739,6 +739,7 @@ void gfs2_clear_rgrpd(struct gfs2_sbd *sdp)
gfs2_free_clones(rgd); kfree(rgd->rd_bits); + rgd->rd_bits = NULL; return_all_reservations(rgd); kmem_cache_free(gfs2_rgrpd_cachep, rgd); } @@ -933,10 +934,6 @@ static int read_rindex_entry(struct gfs2_inode *ip) if (error) goto fail;
- rgd->rd_gl->gl_object = rgd; - rgd->rd_gl->gl_vm.start = (rgd->rd_addr * bsize) & PAGE_CACHE_MASK; - rgd->rd_gl->gl_vm.end = PAGE_CACHE_ALIGN((rgd->rd_addr + - rgd->rd_length) * bsize) - 1; rgd->rd_rgl = (struct gfs2_rgrp_lvb *)rgd->rd_gl->gl_lksb.sb_lvbptr; rgd->rd_flags &= ~(GFS2_RDF_UPTODATE | GFS2_RDF_PREFERRED); if (rgd->rd_data > sdp->sd_max_rg_data) @@ -944,14 +941,20 @@ static int read_rindex_entry(struct gfs2_inode *ip) spin_lock(&sdp->sd_rindex_spin); error = rgd_insert(rgd); spin_unlock(&sdp->sd_rindex_spin); - if (!error) + if (!error) { + rgd->rd_gl->gl_object = rgd; + rgd->rd_gl->gl_vm.start = (rgd->rd_addr * bsize) & PAGE_MASK; + rgd->rd_gl->gl_vm.end = PAGE_ALIGN((rgd->rd_addr + + rgd->rd_length) * bsize) - 1; return 0; + }
error = 0; /* someone else read in the rgrp; free it and ignore it */ gfs2_glock_put(rgd->rd_gl);
fail: kfree(rgd->rd_bits); + rgd->rd_bits = NULL; kmem_cache_free(gfs2_rgrpd_cachep, rgd); return error; }
commit c278c253f3d992c6994d08aa0efb2b6806ca396f upstream.
There is a race between arc_emac_tx() and arc_emac_tx_clean(). sk_buff got freed by arc_emac_tx_clean() while arc_emac_tx() submitting sk_buff.
In order to free sk_buff arc_emac_tx_clean() checks: if ((info & FOR_EMAC) || !txbd->data) break; ... dev_kfree_skb_irq(skb);
If condition false, arc_emac_tx_clean() free sk_buff.
In order to submit txbd, arc_emac_tx() do: priv->tx_buff[*txbd_curr].skb = skb; ... priv->txbd[*txbd_curr].data = cpu_to_le32(addr); ... ... <== arc_emac_tx_clean() check condition here ... <== (info & FOR_EMAC) is false ... <== !txbd->data is false ... *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
In order to reproduce the situation, run device: # iperf -s run on host: # iperf -t 600 -c <device-ip-addr>
[ 28.396284] ------------[ cut here ]------------ [ 28.400912] kernel BUG at .../net/core/skbuff.c:1355! [ 28.414019] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 28.419150] Modules linked in: [ 28.422219] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B 4.4.0+ #120 [ 28.429516] Hardware name: Rockchip (Device Tree) [ 28.434216] task: c0665070 ti: c0660000 task.ti: c0660000 [ 28.439622] PC is at skb_put+0x10/0x54 [ 28.443381] LR is at arc_emac_poll+0x260/0x474 [ 28.447821] pc : [<c03af580>] lr : [<c028fec4>] psr: a0070113 [ 28.447821] sp : c0661e58 ip : eea68502 fp : ef377000 [ 28.459280] r10: 0000012c r9 : f08b2000 r8 : eeb57100 [ 28.464498] r7 : 00000000 r6 : ef376594 r5 : 00000077 r4 : ef376000 [ 28.471015] r3 : 0030488b r2 : ef13e880 r1 : 000005ee r0 : eeb57100 [ 28.477534] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 28.484658] Control: 10c5387d Table: 8eaf004a DAC: 00000051 [ 28.490396] Process swapper/0 (pid: 0, stack limit = 0xc0660210) [ 28.496393] Stack: (0xc0661e58 to 0xc0662000) [ 28.500745] 1e40: 00000002 00000000 [ 28.508913] 1e60: 00000000 ef376520 00000028 f08b23b8 00000000 ef376520 ef7b6900 c028fc64 [ 28.517082] 1e80: 2f158000 c0661ea8 c0661eb0 0000012c c065e900 c03bdeac ffff95e9 c0662100 [ 28.525250] 1ea0: c0663924 00000028 c0661ea8 c0661ea8 c0661eb0 c0661eb0 0000001e c0660000 [ 28.533417] 1ec0: 40000003 00000008 c0695a00 0000000a c066208c 00000100 c0661ee0 c0027410 [ 28.541584] 1ee0: ef0fb700 2f158000 00200000 ffff95e8 00000004 c0662100 c0662080 00000003 [ 28.549751] 1f00: 00000000 00000000 00000000 c065b45c 0000001e ef005000 c0647a30 00000000 [ 28.557919] 1f20: 00000000 c0027798 00000000 c005cf40 f0802100 c0662ffc c0661f60 f0803100 [ 28.566088] 1f40: c0661fb8 c00093bc c000ffb4 60070013 ffffffff c0661f94 c0661fb8 c00137d4 [ 28.574267] 1f60: 00000001 00000000 00000000 c001ffa0 00000000 c0660000 00000000 c065a364 [ 28.582441] 1f80: c0661fb8 c0647a30 00000000 00000000 00000000 c0661fb0 c000ffb0 c000ffb4 [ 28.590608] 1fa0: 60070013 ffffffff 00000051 00000000 00000000 c005496c c0662400 c061bc40 [ 28.598776] 1fc0: ffffffff ffffffff 00000000 c061b680 00000000 c0647a30 00000000 c0695294 [ 28.606943] 1fe0: c0662488 c0647a2c c066619c 6000406a 413fc090 6000807c 00000000 00000000 [ 28.615127] [<c03af580>] (skb_put) from [<ef376520>] (0xef376520) [ 28.621218] Code: e5902054 e590c090 e3520000 0a000000 (e7f001f2) [ 28.627307] ---[ end trace 4824734e2243fdb6 ]---
[ 34.377068] Internal error: Oops: 17 [#1] SMP ARM [ 34.382854] Modules linked in: [ 34.385947] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.4.0+ #120 [ 34.392219] Hardware name: Rockchip (Device Tree) [ 34.396937] task: ef02d040 ti: ef05c000 task.ti: ef05c000 [ 34.402376] PC is at __dev_kfree_skb_irq+0x4/0x80 [ 34.407121] LR is at arc_emac_poll+0x130/0x474 [ 34.411583] pc : [<c03bb640>] lr : [<c028fd94>] psr: 60030013 [ 34.411583] sp : ef05de68 ip : 0008e83c fp : ef377000 [ 34.423062] r10: c001bec4 r9 : 00000000 r8 : f08b24c8 [ 34.428296] r7 : f08b2400 r6 : 00000075 r5 : 00000019 r4 : ef376000 [ 34.434827] r3 : 00060000 r2 : 00000042 r1 : 00000001 r0 : 00000000 [ 34.441365] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 34.448507] Control: 10c5387d Table: 8f25c04a DAC: 00000051 [ 34.454262] Process ksoftirqd/0 (pid: 3, stack limit = 0xef05c210) [ 34.460449] Stack: (0xef05de68 to 0xef05e000) [ 34.464827] de60: ef376000 c028fd94 00000000 c0669480 c0669480 ef376520 [ 34.473022] de80: 00000028 00000001 00002ae4 ef376520 ef7b6900 c028fc64 2f158000 ef05dec0 [ 34.481215] dea0: ef05dec8 0000012c c065e900 c03bdeac ffff983f c0662100 c0663924 00000028 [ 34.489409] dec0: ef05dec0 ef05dec0 ef05dec8 ef05dec8 ef7b6000 ef05c000 40000003 00000008 [ 34.497600] dee0: c0695a00 0000000a c066208c 00000100 ef05def8 c0027410 ef7b6000 40000000 [ 34.505795] df00: 04208040 ffff983e 00000004 c0662100 c0662080 00000003 ef05c000 ef027340 [ 34.513985] df20: ef05c000 c0666c2c 00000000 00000001 00000002 00000000 00000000 c0027568 [ 34.522176] df40: ef027340 c003ef48 ef027300 00000000 ef027340 c003edd4 00000000 00000000 [ 34.530367] df60: 00000000 c003c37c ffffff7f 00000001 00000000 ef027340 00000000 00030003 [ 34.538559] df80: ef05df80 ef05df80 00000000 00000000 ef05df90 ef05df90 ef05dfac ef027300 [ 34.546750] dfa0: c003c2a4 00000000 00000000 c000f578 00000000 00000000 00000000 00000000 [ 34.554939] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 34.563129] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff dfff7fff [ 34.571360] [<c03bb640>] (__dev_kfree_skb_irq) from [<c028fd94>] (arc_emac_poll+0x130/0x474) [ 34.579840] [<c028fd94>] (arc_emac_poll) from [<c03bdeac>] (net_rx_action+0xdc/0x28c) [ 34.587712] [<c03bdeac>] (net_rx_action) from [<c0027410>] (__do_softirq+0xcc/0x1f8) [ 34.595482] [<c0027410>] (__do_softirq) from [<c0027568>] (run_ksoftirqd+0x2c/0x50) [ 34.603168] [<c0027568>] (run_ksoftirqd) from [<c003ef48>] (smpboot_thread_fn+0x174/0x18c) [ 34.611466] [<c003ef48>] (smpboot_thread_fn) from [<c003c37c>] (kthread+0xd8/0xec) [ 34.619075] [<c003c37c>] (kthread) from [<c000f578>] (ret_from_fork+0x14/0x3c) [ 34.626317] Code: e8bd8010 e3a00000 e12fff1e e92d4010 (e59030a4) [ 34.632572] ---[ end trace cca5a3d86a82249a ]---
Signed-off-by: Alexander Kochetkov al.kochet@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/arc/emac_main.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c index 9cc5daed13edd..b0285ac203f09 100644 --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -163,7 +163,7 @@ static void arc_emac_tx_clean(struct net_device *ndev) struct sk_buff *skb = tx_buff->skb; unsigned int info = le32_to_cpu(txbd->info);
- if ((info & FOR_EMAC) || !txbd->data) + if ((info & FOR_EMAC) || !txbd->data || !skb) break;
if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) { @@ -191,6 +191,7 @@ static void arc_emac_tx_clean(struct net_device *ndev)
txbd->data = 0; txbd->info = 0; + tx_buff->skb = NULL;
*txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM; } @@ -619,7 +620,6 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) dma_unmap_addr_set(&priv->tx_buff[*txbd_curr], addr, addr); dma_unmap_len_set(&priv->tx_buff[*txbd_curr], len, len);
- priv->tx_buff[*txbd_curr].skb = skb; priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
/* Make sure pointer to data buffer is set */ @@ -629,6 +629,11 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
+ /* Make sure info word is set */ + wmb(); + + priv->tx_buff[*txbd_curr].skb = skb; + /* Increment index to point to the next BD */ *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
commit a2ac99905f1ea8b15997a6ec39af69aa28a3653b upstream.
handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy polling udp packets with small length(e.g. 1byte udp payload), because setting VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
Ping-Latencies shown below were tested between two Virtual Machines using netperf (UDP_STREAM, len=1), and then another machine pinged the client:
vq size=256 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 3.319 18.489 57.303 64 1.643 2.021 2.552 128 1.825 2.600 3.224 256 1.997 2.710 4.295 512 1.860 3.171 4.631 1024 2.002 4.173 9.056 2048 2.257 5.650 9.688 4096 2.093 8.508 15.943
vq size=512 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 6.537 29.177 66.245 64 2.798 3.614 4.403 128 2.861 3.820 4.775 256 3.008 4.018 4.807 512 3.254 4.523 5.824 1024 3.079 5.335 7.747 2048 3.944 8.201 12.762 4096 4.158 11.057 19.985
Seems pretty consistent, a small dip at 2 VQ sizes. Ring size is a hint from device about a burst size it can tolerate. Based on benchmarks, set the weight to 2 * vq size.
To evaluate this change, another tests were done using netperf(RR, TX) between two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was tweaked through qemu. Results shown below does not show obvious changes.
vq size=256 TCP_RR vq size=512 TCP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -7%/ -2% 1/ 1/ 0%/ -2% 1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% 1/ 8/ +1%/ -2% 1/ 8/ 0%/ +1% 64/ 1/ -6%/ 0% 64/ 1/ +7%/ +3% 64/ 4/ 0%/ +2% 64/ 4/ -1%/ +1% 64/ 8/ 0%/ 0% 64/ 8/ -1%/ -2% 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2% 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2% 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1%
vq size=256 UDP_RR vq size=512 UDP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2% 1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2% 1/ 8/ -1%/ -1% 1/ 8/ -1%/ 0% 64/ 1/ -2%/ -3% 64/ 1/ +1%/ +1% 64/ 4/ -5%/ -1% 64/ 4/ +2%/ 0% 64/ 8/ 0%/ -1% 64/ 8/ -2%/ +1% 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0% 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4% 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1%
vq size=256 TCP_STREAM vq size=512 TCP_STREAM size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0% 64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4% 64/ 8/ +9%/ -4% 64/ 8/ -1%/ +2% 256/ 1/ +1%/ -4% 256/ 1/ +1%/ +1% 256/ 4/ -1%/ -1% 256/ 4/ -3%/ 0% 256/ 8/ +7%/ +5% 256/ 8/ -3%/ 0% 512/ 1/ +1%/ 0% 512/ 1/ -1%/ -1% 512/ 4/ +1%/ -1% 512/ 4/ 0%/ 0% 512/ 8/ +7%/ -5% 512/ 8/ +6%/ -1% 1024/ 1/ 0%/ -1% 1024/ 1/ 0%/ +1% 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% 1024/ 8/ +8%/ +5% 1024/ 8/ -1%/ 0% 2048/ 1/ +2%/ +2% 2048/ 1/ -1%/ 0% 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/ -1% 2048/ 8/ -2%/ 0% 2048/ 8/ 5%/ -1% 4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0% 4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0% 4096/ 8/ +9%/ -2% 4096/ 8/ -5%/ -1%
Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Haibin Zhang haibinzhang@tencent.com Signed-off-by: Yunfang Tai yunfangtai@tencent.com Signed-off-by: Lidong Chen lidongchen@tencent.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/net.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f463171352245..b8496f713bc62 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -39,6 +39,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" * Using this limit prevents one virtqueue from starving others. */ #define VHOST_NET_WEIGHT 0x80000
+/* Max number of packets transferred before requeueing the job. + * Using this limit prevents one virtqueue from starving rx. */ +#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2) + /* MAX number of TX used buffers for outstanding zerocopy */ #define VHOST_MAX_PEND 128 #define VHOST_GOODCOPY_LEN 256 @@ -308,6 +312,7 @@ static void handle_tx(struct vhost_net *net) struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); bool zcopy, zcopy_used; + int sent_pkts = 0;
mutex_lock(&vq->mutex); sock = vq->private_data; @@ -408,7 +413,8 @@ static void handle_tx(struct vhost_net *net) vhost_zerocopy_signal_used(net, vq); total_len += len; vhost_net_tx_packet(net); - if (unlikely(total_len >= VHOST_NET_WEIGHT)) { + if (unlikely(total_len >= VHOST_NET_WEIGHT) || + unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) { vhost_poll_queue(&vq->poll); break; }
commit db688c24eada63b1efe6d0d7d835e5c3bdd71fd3 upstream.
Similar to commit a2ac99905f1e ("vhost-net: set packet weight of tx polling to 2 * vq size"), we need a packet-based limit for handler_rx, too - elsewhere, under rx flood with small packets, tx can be delayed for a very long time, even without busypolling.
The pkt limit applied to handle_rx must be the same applied by handle_tx, or we will get unfair scheduling between rx and tx. Tying such limit to the queue length makes it less effective for large queue length values and can introduce large process scheduler latencies, so a constant valued is used - likewise the existing bytes limit.
The selected limit has been validated with PVP[1] performance test with different queue sizes:
queue size 256 512 1024
baseline 366 354 362 weight 128 715 723 670 weight 256 740 745 733 weight 512 600 460 583 weight 1024 423 427 418
A packet weight of 256 gives peek performances in under all the tested scenarios.
No measurable regression in unidirectional performance tests has been detected.
[1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-v...
Signed-off-by: Paolo Abeni pabeni@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/net.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index b8496f713bc62..c1b5bccab293f 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -40,8 +40,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" #define VHOST_NET_WEIGHT 0x80000
/* Max number of packets transferred before requeueing the job. - * Using this limit prevents one virtqueue from starving rx. */ -#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2) + * Using this limit prevents one virtqueue from starving others with small + * pkts. + */ +#define VHOST_NET_PKT_WEIGHT 256
/* MAX number of TX used buffers for outstanding zerocopy */ #define VHOST_MAX_PEND 128 @@ -414,7 +416,7 @@ static void handle_tx(struct vhost_net *net) total_len += len; vhost_net_tx_packet(net); if (unlikely(total_len >= VHOST_NET_WEIGHT) || - unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) { + unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT)) { vhost_poll_queue(&vq->poll); break; } @@ -545,6 +547,7 @@ static void handle_rx(struct vhost_net *net) struct socket *sock; struct iov_iter fixup; __virtio16 num_buffers; + int recv_pkts = 0;
mutex_lock(&vq->mutex); sock = vq->private_data; @@ -637,7 +640,8 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(total_len >= VHOST_NET_WEIGHT)) { + if (unlikely(total_len >= VHOST_NET_WEIGHT) || + unlikely(++recv_pkts >= VHOST_NET_PKT_WEIGHT)) { vhost_poll_queue(&vq->poll); break; }
commit 272f35cba53d088085e5952fd81d7a133ab90789 upstream.
Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/net.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index c1b5bccab293f..38c3120f92be4 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -293,6 +293,12 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) rcu_read_unlock_bh(); }
+static bool vhost_exceeds_weight(int pkts, int total_len) +{ + return total_len >= VHOST_NET_WEIGHT || + pkts >= VHOST_NET_PKT_WEIGHT; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -415,8 +421,7 @@ static void handle_tx(struct vhost_net *net) vhost_zerocopy_signal_used(net, vq); total_len += len; vhost_net_tx_packet(net); - if (unlikely(total_len >= VHOST_NET_WEIGHT) || - unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT)) { + if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) { vhost_poll_queue(&vq->poll); break; } @@ -640,8 +645,7 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(total_len >= VHOST_NET_WEIGHT) || - unlikely(++recv_pkts >= VHOST_NET_PKT_WEIGHT)) { + if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) { vhost_poll_queue(&vq->poll); break; }
commit e82b9b0727ff6d665fff2d326162b460dded554d upstream.
We used to have vhost_exceeds_weight() for vhost-net to:
- prevent vhost kthread from hogging the cpu - balance the time spent between TX and RX
This function could be useful for vsock and scsi as well. So move it to vhost.c. Device must specify a weight which counts the number of requests, or it can also specific a byte_weight which counts the number of bytes that has been processed.
Signed-off-by: Jason Wang jasowang@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com [bwh: Backported to 4.4: - Drop changes to vhost_vsock - In vhost_net, both Tx modes are handled in one loop in handle_tx() - Adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/net.c | 18 +++++------------- drivers/vhost/scsi.c | 9 ++++++++- drivers/vhost/vhost.c | 20 +++++++++++++++++++- drivers/vhost/vhost.h | 6 +++++- 4 files changed, 37 insertions(+), 16 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 38c3120f92be4..20062531f1eaa 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -293,12 +293,6 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) rcu_read_unlock_bh(); }
-static bool vhost_exceeds_weight(int pkts, int total_len) -{ - return total_len >= VHOST_NET_WEIGHT || - pkts >= VHOST_NET_PKT_WEIGHT; -} - /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -421,10 +415,9 @@ static void handle_tx(struct vhost_net *net) vhost_zerocopy_signal_used(net, vq); total_len += len; vhost_net_tx_packet(net); - if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) { - vhost_poll_queue(&vq->poll); + if (unlikely(vhost_exceeds_weight(vq, ++sent_pkts, + total_len))) break; - } } out: mutex_unlock(&vq->mutex); @@ -645,10 +638,8 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) { - vhost_poll_queue(&vq->poll); + if (unlikely(vhost_exceeds_weight(vq, ++recv_pkts, total_len))) break; - } } out: mutex_unlock(&vq->mutex); @@ -718,7 +709,8 @@ static int vhost_net_open(struct inode *inode, struct file *f) n->vqs[i].vhost_hlen = 0; n->vqs[i].sock_hlen = 0; } - vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); + vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, + VHOST_NET_WEIGHT, VHOST_NET_PKT_WEIGHT);
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev); diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 8fc62a03637a7..47e659eacf17e 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -58,6 +58,12 @@ #define VHOST_SCSI_PREALLOC_UPAGES 2048 #define VHOST_SCSI_PREALLOC_PROT_SGLS 512
+/* Max number of requests before requeueing the job. + * Using this limit prevents one virtqueue from starving others with + * request. + */ +#define VHOST_SCSI_WEIGHT 256 + struct vhost_scsi_inflight { /* Wait for the flush operation to finish */ struct completion comp; @@ -1443,7 +1449,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) vqs[i] = &vs->vqs[i].vq; vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick; } - vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ); + vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, + VHOST_SCSI_WEIGHT, 0);
vhost_scsi_init_inflight(vs, NULL);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 2ed0a356d1d33..0f653f314876e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -370,8 +370,24 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev) vhost_vq_free_iovecs(dev->vqs[i]); }
+bool vhost_exceeds_weight(struct vhost_virtqueue *vq, + int pkts, int total_len) +{ + struct vhost_dev *dev = vq->dev; + + if ((dev->byte_weight && total_len >= dev->byte_weight) || + pkts >= dev->weight) { + vhost_poll_queue(&vq->poll); + return true; + } + + return false; +} +EXPORT_SYMBOL_GPL(vhost_exceeds_weight); + void vhost_dev_init(struct vhost_dev *dev, - struct vhost_virtqueue **vqs, int nvqs) + struct vhost_virtqueue **vqs, int nvqs, + int weight, int byte_weight) { struct vhost_virtqueue *vq; int i; @@ -386,6 +402,8 @@ void vhost_dev_init(struct vhost_dev *dev, spin_lock_init(&dev->work_lock); INIT_LIST_HEAD(&dev->work_list); dev->worker = NULL; + dev->weight = weight; + dev->byte_weight = byte_weight;
for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index d3f767448a72c..5ac4869705696 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -127,9 +127,13 @@ struct vhost_dev { spinlock_t work_lock; struct list_head work_list; struct task_struct *worker; + int weight; + int byte_weight; };
-void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int nvqs); +bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len); +void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, + int nvqs, int weight, int byte_weight); long vhost_dev_set_owner(struct vhost_dev *dev); bool vhost_dev_has_owner(struct vhost_dev *dev); long vhost_dev_check_owner(struct vhost_dev *);
commit e2412c07f8f3040593dfb88207865a3cd58680c0 upstream.
When the rx buffer is too small for a packet, we will discard the vq descriptor and retry it for the next packet:
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk, &busyloop_intr))) { ... /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT | MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } ... }
This makes it possible to trigger a infinite while..continue loop through the co-opreation of two VMs like:
1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the vhost process as much as possible e.g using indirect descriptors or other. 2) Malicious VM2 generate packets to VM1 as fast as possible
Fixing this by checking against weight at the end of RX and TX loop. This also eliminate other similar cases when:
- userspace is consuming the packets in the meanwhile - theoretical TOCTOU attack if guest moving avail index back and forth to hit the continue after vhost find guest just add new buffers
This addresses CVE-2019-3900.
Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short") Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang jasowang@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com [bwh: Backported to 4.4: - Both Tx modes are handled in one loop in handle_tx() - Adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/net.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 20062531f1eaa..1459dc9fd7010 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -326,7 +326,7 @@ static void handle_tx(struct vhost_net *net) hdr_size = nvq->vhost_hlen; zcopy = nvq->ubufs;
- for (;;) { + do { /* Release DMAs done buffers first */ if (zcopy) vhost_zerocopy_signal_used(net, vq); @@ -415,10 +415,7 @@ static void handle_tx(struct vhost_net *net) vhost_zerocopy_signal_used(net, vq); total_len += len; vhost_net_tx_packet(net); - if (unlikely(vhost_exceeds_weight(vq, ++sent_pkts, - total_len))) - break; - } + } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len))); out: mutex_unlock(&vq->mutex); } @@ -560,7 +557,10 @@ static void handle_rx(struct vhost_net *net) vq->log : NULL; mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
- while ((sock_len = peek_head_len(sock->sk))) { + do { + sock_len = peek_head_len(sock->sk); + if (!sock_len) + break; sock_len += sock_hlen; vhost_len = sock_len + vhost_hlen; headcount = get_rx_bufs(vq, vq->heads, vhost_len, @@ -638,9 +638,8 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(vhost_exceeds_weight(vq, ++recv_pkts, total_len))) - break; - } + } while (likely(!vhost_exceeds_weight(vq, ++recv_pkts, total_len))); + out: mutex_unlock(&vq->mutex); } @@ -710,7 +709,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) n->vqs[i].sock_hlen = 0; } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, - VHOST_NET_WEIGHT, VHOST_NET_PKT_WEIGHT); + VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT);
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2 upstream.
This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing scsi kthread from hogging cpu which is guest triggerable.
This addresses CVE-2019-3900.
Cc: Paolo Bonzini pbonzini@redhat.com Cc: Stefan Hajnoczi stefanha@redhat.com Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver") Signed-off-by: Jason Wang jasowang@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com [bwh: Backported to 4.4: - Drop changes in vhost_scsi_ctl_handle_vq() - Adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/scsi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 47e659eacf17e..269cfdd2958de 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -861,7 +861,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) u64 tag; u32 exp_data_len, data_direction; unsigned out, in; - int head, ret, prot_bytes; + int head, ret, prot_bytes, c = 0; size_t req_size, rsp_size = sizeof(struct virtio_scsi_cmd_resp); size_t out_size, in_size; u16 lun; @@ -880,7 +880,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
vhost_disable_notify(&vs->dev, vq);
- for (;;) { + do { head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, NULL, NULL); @@ -1096,7 +1096,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) */ INIT_WORK(&cmd->work, vhost_scsi_submission_work); queue_work(vhost_scsi_workqueue, &cmd->work); - } + } while (likely(!vhost_exceeds_weight(vq, ++c, 0))); out: mutex_unlock(&vq->mutex); }
commit 2c956a60778cbb6a27e0c7a8a52a91378c90e1d1 upstream.
SipHash is a 64-bit keyed hash function that is actually a cryptographically secure PRF, like HMAC. Except SipHash is super fast, and is meant to be used as a hashtable keyed lookup function, or as a general PRF for short input use cases, such as sequence numbers or RNG chaining.
For the first usage:
There are a variety of attacks known as "hashtable poisoning" in which an attacker forms some data such that the hash of that data will be the same, and then preceeds to fill up all entries of a hashbucket. This is a realistic and well-known denial-of-service vector. Currently hashtables use jhash, which is fast but not secure, and some kind of rotating key scheme (or none at all, which isn't good). SipHash is meant as a replacement for jhash in these cases.
There are a modicum of places in the kernel that are vulnerable to hashtable poisoning attacks, either via userspace vectors or network vectors, and there's not a reliable mechanism inside the kernel at the moment to fix it. The first step toward fixing these issues is actually getting a secure primitive into the kernel for developers to use. Then we can, bit by bit, port things over to it as deemed appropriate.
While SipHash is extremely fast for a cryptographically secure function, it is likely a bit slower than the insecure jhash, and so replacements will be evaluated on a case-by-case basis based on whether or not the difference in speed is negligible and whether or not the current jhash usage poses a real security risk.
For the second usage:
A few places in the kernel are using MD5 or SHA1 for creating secure sequence numbers, syn cookies, port numbers, or fast random numbers. SipHash is a faster and more fitting, and more secure replacement for MD5 in those situations. Replacing MD5 and SHA1 with SipHash for these uses is obvious and straight-forward, and so is submitted along with this patch series. There shouldn't be much of a debate over its efficacy.
Dozens of languages are already using this internally for their hash tables and PRFs. Some of the BSDs already use this in their kernels. SipHash is a widely known high-speed solution to a widely known set of problems, and it's time we catch-up.
Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Reviewed-by: Jean-Philippe Aumasson jeanphilippe.aumasson@gmail.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Eric Biggers ebiggers3@gmail.com Cc: David Laight David.Laight@aculab.com Cc: Eric Dumazet eric.dumazet@gmail.com Signed-off-by: David S. Miller davem@davemloft.net [bwh: Backported to 4.4 as dependency of commits df453700e8d8 "inet: switch IP ID generator to siphash" and 3c79107631db "netfilter: ctnetlink: don't use conntrack/expect object addresses as id": - Adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/siphash.txt | 100 ++++++++++++++++ MAINTAINERS | 7 ++ include/linux/siphash.h | 85 ++++++++++++++ lib/Kconfig.debug | 10 ++ lib/Makefile | 3 +- lib/siphash.c | 232 ++++++++++++++++++++++++++++++++++++++ lib/test_siphash.c | 131 +++++++++++++++++++++ 7 files changed, 567 insertions(+), 1 deletion(-) create mode 100644 Documentation/siphash.txt create mode 100644 include/linux/siphash.h create mode 100644 lib/siphash.c create mode 100644 lib/test_siphash.c
diff --git a/Documentation/siphash.txt b/Documentation/siphash.txt new file mode 100644 index 0000000000000..e8e6ddbbaab47 --- /dev/null +++ b/Documentation/siphash.txt @@ -0,0 +1,100 @@ + SipHash - a short input PRF +----------------------------------------------- +Written by Jason A. Donenfeld jason@zx2c4.com + +SipHash is a cryptographically secure PRF -- a keyed hash function -- that +performs very well for short inputs, hence the name. It was designed by +cryptographers Daniel J. Bernstein and Jean-Philippe Aumasson. It is intended +as a replacement for some uses of: `jhash`, `md5_transform`, `sha_transform`, +and so forth. + +SipHash takes a secret key filled with randomly generated numbers and either +an input buffer or several input integers. It spits out an integer that is +indistinguishable from random. You may then use that integer as part of secure +sequence numbers, secure cookies, or mask it off for use in a hash table. + +1. Generating a key + +Keys should always be generated from a cryptographically secure source of +random numbers, either using get_random_bytes or get_random_once: + +siphash_key_t key; +get_random_bytes(&key, sizeof(key)); + +If you're not deriving your key from here, you're doing it wrong. + +2. Using the functions + +There are two variants of the function, one that takes a list of integers, and +one that takes a buffer: + +u64 siphash(const void *data, size_t len, const siphash_key_t *key); + +And: + +u64 siphash_1u64(u64, const siphash_key_t *key); +u64 siphash_2u64(u64, u64, const siphash_key_t *key); +u64 siphash_3u64(u64, u64, u64, const siphash_key_t *key); +u64 siphash_4u64(u64, u64, u64, u64, const siphash_key_t *key); +u64 siphash_1u32(u32, const siphash_key_t *key); +u64 siphash_2u32(u32, u32, const siphash_key_t *key); +u64 siphash_3u32(u32, u32, u32, const siphash_key_t *key); +u64 siphash_4u32(u32, u32, u32, u32, const siphash_key_t *key); + +If you pass the generic siphash function something of a constant length, it +will constant fold at compile-time and automatically choose one of the +optimized functions. + +3. Hashtable key function usage: + +struct some_hashtable { + DECLARE_HASHTABLE(hashtable, 8); + siphash_key_t key; +}; + +void init_hashtable(struct some_hashtable *table) +{ + get_random_bytes(&table->key, sizeof(table->key)); +} + +static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input) +{ + return &table->hashtable[siphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)]; +} + +You may then iterate like usual over the returned hash bucket. + +4. Security + +SipHash has a very high security margin, with its 128-bit key. So long as the +key is kept secret, it is impossible for an attacker to guess the outputs of +the function, even if being able to observe many outputs, since 2^128 outputs +is significant. + +Linux implements the "2-4" variant of SipHash. + +5. Struct-passing Pitfalls + +Often times the XuY functions will not be large enough, and instead you'll +want to pass a pre-filled struct to siphash. When doing this, it's important +to always ensure the struct has no padding holes. The easiest way to do this +is to simply arrange the members of the struct in descending order of size, +and to use offsetendof() instead of sizeof() for getting the size. For +performance reasons, if possible, it's probably a good thing to align the +struct to the right boundary. Here's an example: + +const struct { + struct in6_addr saddr; + u32 counter; + u16 dport; +} __aligned(SIPHASH_ALIGNMENT) combined = { + .saddr = *(struct in6_addr *)saddr, + .counter = counter, + .dport = dport +}; +u64 h = siphash(&combined, offsetofend(typeof(combined), dport), &secret); + +6. Resources + +Read the SipHash paper if you're interested in learning more: +https://131002.net/siphash/siphash.pdf diff --git a/MAINTAINERS b/MAINTAINERS index f4d4a5544dc10..20a31b3579299 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9749,6 +9749,13 @@ F: arch/arm/mach-s3c24xx/mach-bast.c F: arch/arm/mach-s3c24xx/bast-ide.c F: arch/arm/mach-s3c24xx/bast-irq.c
+SIPHASH PRF ROUTINES +M: Jason A. Donenfeld Jason@zx2c4.com +S: Maintained +F: lib/siphash.c +F: lib/test_siphash.c +F: include/linux/siphash.h + TI DAVINCI MACHINE SUPPORT M: Sekhar Nori nsekhar@ti.com M: Kevin Hilman khilman@deeprootsystems.com diff --git a/include/linux/siphash.h b/include/linux/siphash.h new file mode 100644 index 0000000000000..feeb29cd113ed --- /dev/null +++ b/include/linux/siphash.h @@ -0,0 +1,85 @@ +/* Copyright (C) 2016 Jason A. Donenfeld Jason@zx2c4.com. All Rights Reserved. + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + * + * This implementation is specifically for SipHash2-4. + */ + +#ifndef _LINUX_SIPHASH_H +#define _LINUX_SIPHASH_H + +#include <linux/types.h> +#include <linux/kernel.h> + +#define SIPHASH_ALIGNMENT __alignof__(u64) +typedef struct { + u64 key[2]; +} siphash_key_t; + +u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key); +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS +u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key); +#endif + +u64 siphash_1u64(const u64 a, const siphash_key_t *key); +u64 siphash_2u64(const u64 a, const u64 b, const siphash_key_t *key); +u64 siphash_3u64(const u64 a, const u64 b, const u64 c, + const siphash_key_t *key); +u64 siphash_4u64(const u64 a, const u64 b, const u64 c, const u64 d, + const siphash_key_t *key); +u64 siphash_1u32(const u32 a, const siphash_key_t *key); +u64 siphash_3u32(const u32 a, const u32 b, const u32 c, + const siphash_key_t *key); + +static inline u64 siphash_2u32(const u32 a, const u32 b, + const siphash_key_t *key) +{ + return siphash_1u64((u64)b << 32 | a, key); +} +static inline u64 siphash_4u32(const u32 a, const u32 b, const u32 c, + const u32 d, const siphash_key_t *key) +{ + return siphash_2u64((u64)b << 32 | a, (u64)d << 32 | c, key); +} + + +static inline u64 ___siphash_aligned(const __le64 *data, size_t len, + const siphash_key_t *key) +{ + if (__builtin_constant_p(len) && len == 4) + return siphash_1u32(le32_to_cpup((const __le32 *)data), key); + if (__builtin_constant_p(len) && len == 8) + return siphash_1u64(le64_to_cpu(data[0]), key); + if (__builtin_constant_p(len) && len == 16) + return siphash_2u64(le64_to_cpu(data[0]), le64_to_cpu(data[1]), + key); + if (__builtin_constant_p(len) && len == 24) + return siphash_3u64(le64_to_cpu(data[0]), le64_to_cpu(data[1]), + le64_to_cpu(data[2]), key); + if (__builtin_constant_p(len) && len == 32) + return siphash_4u64(le64_to_cpu(data[0]), le64_to_cpu(data[1]), + le64_to_cpu(data[2]), le64_to_cpu(data[3]), + key); + return __siphash_aligned(data, len, key); +} + +/** + * siphash - compute 64-bit siphash PRF value + * @data: buffer to hash + * @size: size of @data + * @key: the siphash key + */ +static inline u64 siphash(const void *data, size_t len, + const siphash_key_t *key) +{ +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + if (!IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT)) + return __siphash_unaligned(data, len, key); +#endif + return ___siphash_aligned(data, len, key); +} + +#endif /* _LINUX_SIPHASH_H */ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index f0602beeba26d..fd1205a3dbdbc 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1706,6 +1706,16 @@ config TEST_RHASHTABLE
If unsure, say N.
+config TEST_HASH + tristate "Perform selftest on hash functions" + default n + help + Enable this option to test the kernel's siphash (<linux/siphash.h>) + hash functions on boot (or module load). + + This is intended to help people writing architecture-specific + optimized versions. If unsure, say N. + endmenu # runtime tests
config PROVIDE_OHCI1394_DMA_INIT diff --git a/lib/Makefile b/lib/Makefile index cb4f6aa95013e..6c6c1fb2fa04e 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ sha1.o md5.o irq_regs.o argv_split.o \ proportions.o flex_proportions.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ - earlycpio.o seq_buf.o nmi_backtrace.o + earlycpio.o seq_buf.o siphash.o nmi_backtrace.o
obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o lib-$(CONFIG_MMU) += ioremap.o @@ -35,6 +35,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test-hexdump.o obj-y += kstrtox.o obj-$(CONFIG_TEST_BPF) += test_bpf.o obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o +obj-$(CONFIG_TEST_HASH) += test_siphash.o obj-$(CONFIG_TEST_KASAN) += test_kasan.o obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o obj-$(CONFIG_TEST_LKM) += test_module.o diff --git a/lib/siphash.c b/lib/siphash.c new file mode 100644 index 0000000000000..c43cf406e71b2 --- /dev/null +++ b/lib/siphash.c @@ -0,0 +1,232 @@ +/* Copyright (C) 2016 Jason A. Donenfeld Jason@zx2c4.com. All Rights Reserved. + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + * + * This implementation is specifically for SipHash2-4. + */ + +#include <linux/siphash.h> +#include <asm/unaligned.h> + +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64 +#include <linux/dcache.h> +#include <asm/word-at-a-time.h> +#endif + +#define SIPROUND \ + do { \ + v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \ + v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \ + v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \ + v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \ + } while (0) + +#define PREAMBLE(len) \ + u64 v0 = 0x736f6d6570736575ULL; \ + u64 v1 = 0x646f72616e646f6dULL; \ + u64 v2 = 0x6c7967656e657261ULL; \ + u64 v3 = 0x7465646279746573ULL; \ + u64 b = ((u64)(len)) << 56; \ + v3 ^= key->key[1]; \ + v2 ^= key->key[0]; \ + v1 ^= key->key[1]; \ + v0 ^= key->key[0]; + +#define POSTAMBLE \ + v3 ^= b; \ + SIPROUND; \ + SIPROUND; \ + v0 ^= b; \ + v2 ^= 0xff; \ + SIPROUND; \ + SIPROUND; \ + SIPROUND; \ + SIPROUND; \ + return (v0 ^ v1) ^ (v2 ^ v3); + +u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key) +{ + const u8 *end = data + len - (len % sizeof(u64)); + const u8 left = len & (sizeof(u64) - 1); + u64 m; + PREAMBLE(len) + for (; data != end; data += sizeof(u64)) { + m = le64_to_cpup(data); + v3 ^= m; + SIPROUND; + SIPROUND; + v0 ^= m; + } +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64 + if (left) + b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & + bytemask_from_count(left))); +#else + switch (left) { + case 7: b |= ((u64)end[6]) << 48; + case 6: b |= ((u64)end[5]) << 40; + case 5: b |= ((u64)end[4]) << 32; + case 4: b |= le32_to_cpup(data); break; + case 3: b |= ((u64)end[2]) << 16; + case 2: b |= le16_to_cpup(data); break; + case 1: b |= end[0]; + } +#endif + POSTAMBLE +} +EXPORT_SYMBOL(__siphash_aligned); + +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS +u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key) +{ + const u8 *end = data + len - (len % sizeof(u64)); + const u8 left = len & (sizeof(u64) - 1); + u64 m; + PREAMBLE(len) + for (; data != end; data += sizeof(u64)) { + m = get_unaligned_le64(data); + v3 ^= m; + SIPROUND; + SIPROUND; + v0 ^= m; + } +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64 + if (left) + b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & + bytemask_from_count(left))); +#else + switch (left) { + case 7: b |= ((u64)end[6]) << 48; + case 6: b |= ((u64)end[5]) << 40; + case 5: b |= ((u64)end[4]) << 32; + case 4: b |= get_unaligned_le32(end); break; + case 3: b |= ((u64)end[2]) << 16; + case 2: b |= get_unaligned_le16(end); break; + case 1: b |= end[0]; + } +#endif + POSTAMBLE +} +EXPORT_SYMBOL(__siphash_unaligned); +#endif + +/** + * siphash_1u64 - compute 64-bit siphash PRF value of a u64 + * @first: first u64 + * @key: the siphash key + */ +u64 siphash_1u64(const u64 first, const siphash_key_t *key) +{ + PREAMBLE(8) + v3 ^= first; + SIPROUND; + SIPROUND; + v0 ^= first; + POSTAMBLE +} +EXPORT_SYMBOL(siphash_1u64); + +/** + * siphash_2u64 - compute 64-bit siphash PRF value of 2 u64 + * @first: first u64 + * @second: second u64 + * @key: the siphash key + */ +u64 siphash_2u64(const u64 first, const u64 second, const siphash_key_t *key) +{ + PREAMBLE(16) + v3 ^= first; + SIPROUND; + SIPROUND; + v0 ^= first; + v3 ^= second; + SIPROUND; + SIPROUND; + v0 ^= second; + POSTAMBLE +} +EXPORT_SYMBOL(siphash_2u64); + +/** + * siphash_3u64 - compute 64-bit siphash PRF value of 3 u64 + * @first: first u64 + * @second: second u64 + * @third: third u64 + * @key: the siphash key + */ +u64 siphash_3u64(const u64 first, const u64 second, const u64 third, + const siphash_key_t *key) +{ + PREAMBLE(24) + v3 ^= first; + SIPROUND; + SIPROUND; + v0 ^= first; + v3 ^= second; + SIPROUND; + SIPROUND; + v0 ^= second; + v3 ^= third; + SIPROUND; + SIPROUND; + v0 ^= third; + POSTAMBLE +} +EXPORT_SYMBOL(siphash_3u64); + +/** + * siphash_4u64 - compute 64-bit siphash PRF value of 4 u64 + * @first: first u64 + * @second: second u64 + * @third: third u64 + * @forth: forth u64 + * @key: the siphash key + */ +u64 siphash_4u64(const u64 first, const u64 second, const u64 third, + const u64 forth, const siphash_key_t *key) +{ + PREAMBLE(32) + v3 ^= first; + SIPROUND; + SIPROUND; + v0 ^= first; + v3 ^= second; + SIPROUND; + SIPROUND; + v0 ^= second; + v3 ^= third; + SIPROUND; + SIPROUND; + v0 ^= third; + v3 ^= forth; + SIPROUND; + SIPROUND; + v0 ^= forth; + POSTAMBLE +} +EXPORT_SYMBOL(siphash_4u64); + +u64 siphash_1u32(const u32 first, const siphash_key_t *key) +{ + PREAMBLE(4) + b |= first; + POSTAMBLE +} +EXPORT_SYMBOL(siphash_1u32); + +u64 siphash_3u32(const u32 first, const u32 second, const u32 third, + const siphash_key_t *key) +{ + u64 combined = (u64)second << 32 | first; + PREAMBLE(12) + v3 ^= combined; + SIPROUND; + SIPROUND; + v0 ^= combined; + b |= third; + POSTAMBLE +} +EXPORT_SYMBOL(siphash_3u32); diff --git a/lib/test_siphash.c b/lib/test_siphash.c new file mode 100644 index 0000000000000..d972acfc15e4a --- /dev/null +++ b/lib/test_siphash.c @@ -0,0 +1,131 @@ +/* Test cases for siphash.c + * + * Copyright (C) 2016 Jason A. Donenfeld Jason@zx2c4.com. All Rights Reserved. + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + * + * This implementation is specifically for SipHash2-4. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/siphash.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <linux/errno.h> +#include <linux/module.h> + +/* Test vectors taken from official reference source available at: + * https://131002.net/siphash/siphash24.c + */ + +static const siphash_key_t test_key_siphash = + {{ 0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL }}; + +static const u64 test_vectors_siphash[64] = { + 0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL, + 0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL, + 0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL, + 0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL, + 0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL, + 0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL, + 0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL, + 0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL, + 0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL, + 0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL, + 0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL, + 0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL, + 0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL, + 0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL, + 0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL, + 0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL, + 0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL, + 0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL, + 0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL, + 0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL, + 0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL, + 0x958a324ceb064572ULL +}; + +static int __init siphash_test_init(void) +{ + u8 in[64] __aligned(SIPHASH_ALIGNMENT); + u8 in_unaligned[65] __aligned(SIPHASH_ALIGNMENT); + u8 i; + int ret = 0; + + for (i = 0; i < 64; ++i) { + in[i] = i; + in_unaligned[i + 1] = i; + if (siphash(in, i, &test_key_siphash) != + test_vectors_siphash[i]) { + pr_info("siphash self-test aligned %u: FAIL\n", i + 1); + ret = -EINVAL; + } + if (siphash(in_unaligned + 1, i, &test_key_siphash) != + test_vectors_siphash[i]) { + pr_info("siphash self-test unaligned %u: FAIL\n", i + 1); + ret = -EINVAL; + } + } + if (siphash_1u64(0x0706050403020100ULL, &test_key_siphash) != + test_vectors_siphash[8]) { + pr_info("siphash self-test 1u64: FAIL\n"); + ret = -EINVAL; + } + if (siphash_2u64(0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL, + &test_key_siphash) != test_vectors_siphash[16]) { + pr_info("siphash self-test 2u64: FAIL\n"); + ret = -EINVAL; + } + if (siphash_3u64(0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL, + 0x1716151413121110ULL, &test_key_siphash) != + test_vectors_siphash[24]) { + pr_info("siphash self-test 3u64: FAIL\n"); + ret = -EINVAL; + } + if (siphash_4u64(0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL, + 0x1716151413121110ULL, 0x1f1e1d1c1b1a1918ULL, + &test_key_siphash) != test_vectors_siphash[32]) { + pr_info("siphash self-test 4u64: FAIL\n"); + ret = -EINVAL; + } + if (siphash_1u32(0x03020100U, &test_key_siphash) != + test_vectors_siphash[4]) { + pr_info("siphash self-test 1u32: FAIL\n"); + ret = -EINVAL; + } + if (siphash_2u32(0x03020100U, 0x07060504U, &test_key_siphash) != + test_vectors_siphash[8]) { + pr_info("siphash self-test 2u32: FAIL\n"); + ret = -EINVAL; + } + if (siphash_3u32(0x03020100U, 0x07060504U, + 0x0b0a0908U, &test_key_siphash) != + test_vectors_siphash[12]) { + pr_info("siphash self-test 3u32: FAIL\n"); + ret = -EINVAL; + } + if (siphash_4u32(0x03020100U, 0x07060504U, + 0x0b0a0908U, 0x0f0e0d0cU, &test_key_siphash) != + test_vectors_siphash[16]) { + pr_info("siphash self-test 4u32: FAIL\n"); + ret = -EINVAL; + } + if (!ret) + pr_info("self-tests: pass\n"); + return ret; +} + +static void __exit siphash_test_exit(void) +{ +} + +module_init(siphash_test_init); +module_exit(siphash_test_exit); + +MODULE_AUTHOR("Jason A. Donenfeld Jason@zx2c4.com"); +MODULE_LICENSE("Dual BSD/GPL");
commit 1ae2324f732c9c4e2fa4ebd885fa1001b70d52e1 upstream.
HalfSipHash, or hsiphash, is a shortened version of SipHash, which generates 32-bit outputs using a weaker 64-bit key. It has *much* lower security margins, and shouldn't be used for anything too sensitive, but it could be used as a hashtable key function replacement, if the output is never exposed, and if the security requirement is not too high.
The goal is to make this something that performance-critical jhash users would be willing to use.
On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias SipHash1-3 to HalfSipHash1-3 on those systems.
64-bit x86_64: [ 0.509409] test_siphash: SipHash2-4 cycles: 4049181 [ 0.510650] test_siphash: SipHash1-3 cycles: 2512884 [ 0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920 [ 0.512904] test_siphash: JenkinsHash cycles: 978267 So, we map hsiphash() -> SipHash1-3
32-bit x86: [ 0.509868] test_siphash: SipHash2-4 cycles: 14812892 [ 0.513601] test_siphash: SipHash1-3 cycles: 9510710 [ 0.515263] test_siphash: HalfSipHash1-3 cycles: 3856157 [ 0.515952] test_siphash: JenkinsHash cycles: 1148567 So, we map hsiphash() -> HalfSipHash1-3
hsiphash() is roughly 3 times slower than jhash(), but comes with a considerable security improvement.
Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Reviewed-by: Jean-Philippe Aumasson jeanphilippe.aumasson@gmail.com Signed-off-by: David S. Miller davem@davemloft.net [bwh: Backported to 4.4 to avoid regression for WireGuard with only half the siphash API present] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/siphash.txt | 75 +++++++++ include/linux/siphash.h | 57 ++++++- lib/siphash.c | 321 +++++++++++++++++++++++++++++++++++++- lib/test_siphash.c | 98 +++++++++++- 4 files changed, 546 insertions(+), 5 deletions(-)
diff --git a/Documentation/siphash.txt b/Documentation/siphash.txt index e8e6ddbbaab47..908d348ff7776 100644 --- a/Documentation/siphash.txt +++ b/Documentation/siphash.txt @@ -98,3 +98,78 @@ u64 h = siphash(&combined, offsetofend(typeof(combined), dport), &secret);
Read the SipHash paper if you're interested in learning more: https://131002.net/siphash/siphash.pdf + + +~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~ + +HalfSipHash - SipHash's insecure younger cousin +----------------------------------------------- +Written by Jason A. Donenfeld jason@zx2c4.com + +On the off-chance that SipHash is not fast enough for your needs, you might be +able to justify using HalfSipHash, a terrifying but potentially useful +possibility. HalfSipHash cuts SipHash's rounds down from "2-4" to "1-3" and, +even scarier, uses an easily brute-forcable 64-bit key (with a 32-bit output) +instead of SipHash's 128-bit key. However, this may appeal to some +high-performance `jhash` users. + +Danger! + +Do not ever use HalfSipHash except for as a hashtable key function, and only +then when you can be absolutely certain that the outputs will never be +transmitted out of the kernel. This is only remotely useful over `jhash` as a +means of mitigating hashtable flooding denial of service attacks. + +1. Generating a key + +Keys should always be generated from a cryptographically secure source of +random numbers, either using get_random_bytes or get_random_once: + +hsiphash_key_t key; +get_random_bytes(&key, sizeof(key)); + +If you're not deriving your key from here, you're doing it wrong. + +2. Using the functions + +There are two variants of the function, one that takes a list of integers, and +one that takes a buffer: + +u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key); + +And: + +u32 hsiphash_1u32(u32, const hsiphash_key_t *key); +u32 hsiphash_2u32(u32, u32, const hsiphash_key_t *key); +u32 hsiphash_3u32(u32, u32, u32, const hsiphash_key_t *key); +u32 hsiphash_4u32(u32, u32, u32, u32, const hsiphash_key_t *key); + +If you pass the generic hsiphash function something of a constant length, it +will constant fold at compile-time and automatically choose one of the +optimized functions. + +3. Hashtable key function usage: + +struct some_hashtable { + DECLARE_HASHTABLE(hashtable, 8); + hsiphash_key_t key; +}; + +void init_hashtable(struct some_hashtable *table) +{ + get_random_bytes(&table->key, sizeof(table->key)); +} + +static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input) +{ + return &table->hashtable[hsiphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)]; +} + +You may then iterate like usual over the returned hash bucket. + +4. Performance + +HalfSipHash is roughly 3 times slower than JenkinsHash. For many replacements, +this will not be a problem, as the hashtable lookup isn't the bottleneck. And +in general, this is probably a good sacrifice to make for the security and DoS +resistance of HalfSipHash. diff --git a/include/linux/siphash.h b/include/linux/siphash.h index feeb29cd113ed..fa7a6b9cedbff 100644 --- a/include/linux/siphash.h +++ b/include/linux/siphash.h @@ -5,7 +5,9 @@ * SipHash: a fast short-input PRF * https://131002.net/siphash/ * - * This implementation is specifically for SipHash2-4. + * This implementation is specifically for SipHash2-4 for a secure PRF + * and HalfSipHash1-3/SipHash1-3 for an insecure PRF only suitable for + * hashtables. */
#ifndef _LINUX_SIPHASH_H @@ -82,4 +84,57 @@ static inline u64 siphash(const void *data, size_t len, return ___siphash_aligned(data, len, key); }
+#define HSIPHASH_ALIGNMENT __alignof__(unsigned long) +typedef struct { + unsigned long key[2]; +} hsiphash_key_t; + +u32 __hsiphash_aligned(const void *data, size_t len, + const hsiphash_key_t *key); +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS +u32 __hsiphash_unaligned(const void *data, size_t len, + const hsiphash_key_t *key); +#endif + +u32 hsiphash_1u32(const u32 a, const hsiphash_key_t *key); +u32 hsiphash_2u32(const u32 a, const u32 b, const hsiphash_key_t *key); +u32 hsiphash_3u32(const u32 a, const u32 b, const u32 c, + const hsiphash_key_t *key); +u32 hsiphash_4u32(const u32 a, const u32 b, const u32 c, const u32 d, + const hsiphash_key_t *key); + +static inline u32 ___hsiphash_aligned(const __le32 *data, size_t len, + const hsiphash_key_t *key) +{ + if (__builtin_constant_p(len) && len == 4) + return hsiphash_1u32(le32_to_cpu(data[0]), key); + if (__builtin_constant_p(len) && len == 8) + return hsiphash_2u32(le32_to_cpu(data[0]), le32_to_cpu(data[1]), + key); + if (__builtin_constant_p(len) && len == 12) + return hsiphash_3u32(le32_to_cpu(data[0]), le32_to_cpu(data[1]), + le32_to_cpu(data[2]), key); + if (__builtin_constant_p(len) && len == 16) + return hsiphash_4u32(le32_to_cpu(data[0]), le32_to_cpu(data[1]), + le32_to_cpu(data[2]), le32_to_cpu(data[3]), + key); + return __hsiphash_aligned(data, len, key); +} + +/** + * hsiphash - compute 32-bit hsiphash PRF value + * @data: buffer to hash + * @size: size of @data + * @key: the hsiphash key + */ +static inline u32 hsiphash(const void *data, size_t len, + const hsiphash_key_t *key) +{ +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + if (!IS_ALIGNED((unsigned long)data, HSIPHASH_ALIGNMENT)) + return __hsiphash_unaligned(data, len, key); +#endif + return ___hsiphash_aligned(data, len, key); +} + #endif /* _LINUX_SIPHASH_H */ diff --git a/lib/siphash.c b/lib/siphash.c index c43cf406e71b2..3ae58b4edad61 100644 --- a/lib/siphash.c +++ b/lib/siphash.c @@ -5,7 +5,9 @@ * SipHash: a fast short-input PRF * https://131002.net/siphash/ * - * This implementation is specifically for SipHash2-4. + * This implementation is specifically for SipHash2-4 for a secure PRF + * and HalfSipHash1-3/SipHash1-3 for an insecure PRF only suitable for + * hashtables. */
#include <linux/siphash.h> @@ -230,3 +232,320 @@ u64 siphash_3u32(const u32 first, const u32 second, const u32 third, POSTAMBLE } EXPORT_SYMBOL(siphash_3u32); + +#if BITS_PER_LONG == 64 +/* Note that on 64-bit, we make HalfSipHash1-3 actually be SipHash1-3, for + * performance reasons. On 32-bit, below, we actually implement HalfSipHash1-3. + */ + +#define HSIPROUND SIPROUND +#define HPREAMBLE(len) PREAMBLE(len) +#define HPOSTAMBLE \ + v3 ^= b; \ + HSIPROUND; \ + v0 ^= b; \ + v2 ^= 0xff; \ + HSIPROUND; \ + HSIPROUND; \ + HSIPROUND; \ + return (v0 ^ v1) ^ (v2 ^ v3); + +u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key) +{ + const u8 *end = data + len - (len % sizeof(u64)); + const u8 left = len & (sizeof(u64) - 1); + u64 m; + HPREAMBLE(len) + for (; data != end; data += sizeof(u64)) { + m = le64_to_cpup(data); + v3 ^= m; + HSIPROUND; + v0 ^= m; + } +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64 + if (left) + b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & + bytemask_from_count(left))); +#else + switch (left) { + case 7: b |= ((u64)end[6]) << 48; + case 6: b |= ((u64)end[5]) << 40; + case 5: b |= ((u64)end[4]) << 32; + case 4: b |= le32_to_cpup(data); break; + case 3: b |= ((u64)end[2]) << 16; + case 2: b |= le16_to_cpup(data); break; + case 1: b |= end[0]; + } +#endif + HPOSTAMBLE +} +EXPORT_SYMBOL(__hsiphash_aligned); + +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS +u32 __hsiphash_unaligned(const void *data, size_t len, + const hsiphash_key_t *key) +{ + const u8 *end = data + len - (len % sizeof(u64)); + const u8 left = len & (sizeof(u64) - 1); + u64 m; + HPREAMBLE(len) + for (; data != end; data += sizeof(u64)) { + m = get_unaligned_le64(data); + v3 ^= m; + HSIPROUND; + v0 ^= m; + } +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64 + if (left) + b |= le64_to_cpu((__force __le64)(load_unaligned_zeropad(data) & + bytemask_from_count(left))); +#else + switch (left) { + case 7: b |= ((u64)end[6]) << 48; + case 6: b |= ((u64)end[5]) << 40; + case 5: b |= ((u64)end[4]) << 32; + case 4: b |= get_unaligned_le32(end); break; + case 3: b |= ((u64)end[2]) << 16; + case 2: b |= get_unaligned_le16(end); break; + case 1: b |= end[0]; + } +#endif + HPOSTAMBLE +} +EXPORT_SYMBOL(__hsiphash_unaligned); +#endif + +/** + * hsiphash_1u32 - compute 64-bit hsiphash PRF value of a u32 + * @first: first u32 + * @key: the hsiphash key + */ +u32 hsiphash_1u32(const u32 first, const hsiphash_key_t *key) +{ + HPREAMBLE(4) + b |= first; + HPOSTAMBLE +} +EXPORT_SYMBOL(hsiphash_1u32); + +/** + * hsiphash_2u32 - compute 32-bit hsiphash PRF value of 2 u32 + * @first: first u32 + * @second: second u32 + * @key: the hsiphash key + */ +u32 hsiphash_2u32(const u32 first, const u32 second, const hsiphash_key_t *key) +{ + u64 combined = (u64)second << 32 | first; + HPREAMBLE(8) + v3 ^= combined; + HSIPROUND; + v0 ^= combined; + HPOSTAMBLE +} +EXPORT_SYMBOL(hsiphash_2u32); + +/** + * hsiphash_3u32 - compute 32-bit hsiphash PRF value of 3 u32 + * @first: first u32 + * @second: second u32 + * @third: third u32 + * @key: the hsiphash key + */ +u32 hsiphash_3u32(const u32 first, const u32 second, const u32 third, + const hsiphash_key_t *key) +{ + u64 combined = (u64)second << 32 | first; + HPREAMBLE(12) + v3 ^= combined; + HSIPROUND; + v0 ^= combined; + b |= third; + HPOSTAMBLE +} +EXPORT_SYMBOL(hsiphash_3u32); + +/** + * hsiphash_4u32 - compute 32-bit hsiphash PRF value of 4 u32 + * @first: first u32 + * @second: second u32 + * @third: third u32 + * @forth: forth u32 + * @key: the hsiphash key + */ +u32 hsiphash_4u32(const u32 first, const u32 second, const u32 third, + const u32 forth, const hsiphash_key_t *key) +{ + u64 combined = (u64)second << 32 | first; + HPREAMBLE(16) + v3 ^= combined; + HSIPROUND; + v0 ^= combined; + combined = (u64)forth << 32 | third; + v3 ^= combined; + HSIPROUND; + v0 ^= combined; + HPOSTAMBLE +} +EXPORT_SYMBOL(hsiphash_4u32); +#else +#define HSIPROUND \ + do { \ + v0 += v1; v1 = rol32(v1, 5); v1 ^= v0; v0 = rol32(v0, 16); \ + v2 += v3; v3 = rol32(v3, 8); v3 ^= v2; \ + v0 += v3; v3 = rol32(v3, 7); v3 ^= v0; \ + v2 += v1; v1 = rol32(v1, 13); v1 ^= v2; v2 = rol32(v2, 16); \ + } while (0) + +#define HPREAMBLE(len) \ + u32 v0 = 0; \ + u32 v1 = 0; \ + u32 v2 = 0x6c796765U; \ + u32 v3 = 0x74656462U; \ + u32 b = ((u32)(len)) << 24; \ + v3 ^= key->key[1]; \ + v2 ^= key->key[0]; \ + v1 ^= key->key[1]; \ + v0 ^= key->key[0]; + +#define HPOSTAMBLE \ + v3 ^= b; \ + HSIPROUND; \ + v0 ^= b; \ + v2 ^= 0xff; \ + HSIPROUND; \ + HSIPROUND; \ + HSIPROUND; \ + return v1 ^ v3; + +u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key) +{ + const u8 *end = data + len - (len % sizeof(u32)); + const u8 left = len & (sizeof(u32) - 1); + u32 m; + HPREAMBLE(len) + for (; data != end; data += sizeof(u32)) { + m = le32_to_cpup(data); + v3 ^= m; + HSIPROUND; + v0 ^= m; + } + switch (left) { + case 3: b |= ((u32)end[2]) << 16; + case 2: b |= le16_to_cpup(data); break; + case 1: b |= end[0]; + } + HPOSTAMBLE +} +EXPORT_SYMBOL(__hsiphash_aligned); + +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS +u32 __hsiphash_unaligned(const void *data, size_t len, + const hsiphash_key_t *key) +{ + const u8 *end = data + len - (len % sizeof(u32)); + const u8 left = len & (sizeof(u32) - 1); + u32 m; + HPREAMBLE(len) + for (; data != end; data += sizeof(u32)) { + m = get_unaligned_le32(data); + v3 ^= m; + HSIPROUND; + v0 ^= m; + } + switch (left) { + case 3: b |= ((u32)end[2]) << 16; + case 2: b |= get_unaligned_le16(end); break; + case 1: b |= end[0]; + } + HPOSTAMBLE +} +EXPORT_SYMBOL(__hsiphash_unaligned); +#endif + +/** + * hsiphash_1u32 - compute 32-bit hsiphash PRF value of a u32 + * @first: first u32 + * @key: the hsiphash key + */ +u32 hsiphash_1u32(const u32 first, const hsiphash_key_t *key) +{ + HPREAMBLE(4) + v3 ^= first; + HSIPROUND; + v0 ^= first; + HPOSTAMBLE +} +EXPORT_SYMBOL(hsiphash_1u32); + +/** + * hsiphash_2u32 - compute 32-bit hsiphash PRF value of 2 u32 + * @first: first u32 + * @second: second u32 + * @key: the hsiphash key + */ +u32 hsiphash_2u32(const u32 first, const u32 second, const hsiphash_key_t *key) +{ + HPREAMBLE(8) + v3 ^= first; + HSIPROUND; + v0 ^= first; + v3 ^= second; + HSIPROUND; + v0 ^= second; + HPOSTAMBLE +} +EXPORT_SYMBOL(hsiphash_2u32); + +/** + * hsiphash_3u32 - compute 32-bit hsiphash PRF value of 3 u32 + * @first: first u32 + * @second: second u32 + * @third: third u32 + * @key: the hsiphash key + */ +u32 hsiphash_3u32(const u32 first, const u32 second, const u32 third, + const hsiphash_key_t *key) +{ + HPREAMBLE(12) + v3 ^= first; + HSIPROUND; + v0 ^= first; + v3 ^= second; + HSIPROUND; + v0 ^= second; + v3 ^= third; + HSIPROUND; + v0 ^= third; + HPOSTAMBLE +} +EXPORT_SYMBOL(hsiphash_3u32); + +/** + * hsiphash_4u32 - compute 32-bit hsiphash PRF value of 4 u32 + * @first: first u32 + * @second: second u32 + * @third: third u32 + * @forth: forth u32 + * @key: the hsiphash key + */ +u32 hsiphash_4u32(const u32 first, const u32 second, const u32 third, + const u32 forth, const hsiphash_key_t *key) +{ + HPREAMBLE(16) + v3 ^= first; + HSIPROUND; + v0 ^= first; + v3 ^= second; + HSIPROUND; + v0 ^= second; + v3 ^= third; + HSIPROUND; + v0 ^= third; + v3 ^= forth; + HSIPROUND; + v0 ^= forth; + HPOSTAMBLE +} +EXPORT_SYMBOL(hsiphash_4u32); +#endif diff --git a/lib/test_siphash.c b/lib/test_siphash.c index d972acfc15e4a..a6d854d933bff 100644 --- a/lib/test_siphash.c +++ b/lib/test_siphash.c @@ -7,7 +7,9 @@ * SipHash: a fast short-input PRF * https://131002.net/siphash/ * - * This implementation is specifically for SipHash2-4. + * This implementation is specifically for SipHash2-4 for a secure PRF + * and HalfSipHash1-3/SipHash1-3 for an insecure PRF only suitable for + * hashtables. */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -18,8 +20,8 @@ #include <linux/errno.h> #include <linux/module.h>
-/* Test vectors taken from official reference source available at: - * https://131002.net/siphash/siphash24.c +/* Test vectors taken from reference source available at: + * https://github.com/veorq/SipHash */
static const siphash_key_t test_key_siphash = @@ -50,6 +52,64 @@ static const u64 test_vectors_siphash[64] = { 0x958a324ceb064572ULL };
+#if BITS_PER_LONG == 64 +static const hsiphash_key_t test_key_hsiphash = + {{ 0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL }}; + +static const u32 test_vectors_hsiphash[64] = { + 0x050fc4dcU, 0x7d57ca93U, 0x4dc7d44dU, + 0xe7ddf7fbU, 0x88d38328U, 0x49533b67U, + 0xc59f22a7U, 0x9bb11140U, 0x8d299a8eU, + 0x6c063de4U, 0x92ff097fU, 0xf94dc352U, + 0x57b4d9a2U, 0x1229ffa7U, 0xc0f95d34U, + 0x2a519956U, 0x7d908b66U, 0x63dbd80cU, + 0xb473e63eU, 0x8d297d1cU, 0xa6cce040U, + 0x2b45f844U, 0xa320872eU, 0xdae6c123U, + 0x67349c8cU, 0x705b0979U, 0xca9913a5U, + 0x4ade3b35U, 0xef6cd00dU, 0x4ab1e1f4U, + 0x43c5e663U, 0x8c21d1bcU, 0x16a7b60dU, + 0x7a8ff9bfU, 0x1f2a753eU, 0xbf186b91U, + 0xada26206U, 0xa3c33057U, 0xae3a36a1U, + 0x7b108392U, 0x99e41531U, 0x3f1ad944U, + 0xc8138825U, 0xc28949a6U, 0xfaf8876bU, + 0x9f042196U, 0x68b1d623U, 0x8b5114fdU, + 0xdf074c46U, 0x12cc86b3U, 0x0a52098fU, + 0x9d292f9aU, 0xa2f41f12U, 0x43a71ed0U, + 0x73f0bce6U, 0x70a7e980U, 0x243c6d75U, + 0xfdb71513U, 0xa67d8a08U, 0xb7e8f148U, + 0xf7a644eeU, 0x0f1837f2U, 0x4b6694e0U, + 0xb7bbb3a8U +}; +#else +static const hsiphash_key_t test_key_hsiphash = + {{ 0x03020100U, 0x07060504U }}; + +static const u32 test_vectors_hsiphash[64] = { + 0x5814c896U, 0xe7e864caU, 0xbc4b0e30U, + 0x01539939U, 0x7e059ea6U, 0x88e3d89bU, + 0xa0080b65U, 0x9d38d9d6U, 0x577999b1U, + 0xc839caedU, 0xe4fa32cfU, 0x959246eeU, + 0x6b28096cU, 0x66dd9cd6U, 0x16658a7cU, + 0xd0257b04U, 0x8b31d501U, 0x2b1cd04bU, + 0x06712339U, 0x522aca67U, 0x911bb605U, + 0x90a65f0eU, 0xf826ef7bU, 0x62512debU, + 0x57150ad7U, 0x5d473507U, 0x1ec47442U, + 0xab64afd3U, 0x0a4100d0U, 0x6d2ce652U, + 0x2331b6a3U, 0x08d8791aU, 0xbc6dda8dU, + 0xe0f6c934U, 0xb0652033U, 0x9b9851ccU, + 0x7c46fb7fU, 0x732ba8cbU, 0xf142997aU, + 0xfcc9aa1bU, 0x05327eb2U, 0xe110131cU, + 0xf9e5e7c0U, 0xa7d708a6U, 0x11795ab1U, + 0x65671619U, 0x9f5fff91U, 0xd89c5267U, + 0x007783ebU, 0x95766243U, 0xab639262U, + 0x9c7e1390U, 0xc368dda6U, 0x38ddc455U, + 0xfa13d379U, 0x979ea4e8U, 0x53ecd77eU, + 0x2ee80657U, 0x33dbb66aU, 0xae3f0577U, + 0x88b4c4ccU, 0x3e7f480bU, 0x74c1ebf8U, + 0x87178304U +}; +#endif + static int __init siphash_test_init(void) { u8 in[64] __aligned(SIPHASH_ALIGNMENT); @@ -70,6 +130,16 @@ static int __init siphash_test_init(void) pr_info("siphash self-test unaligned %u: FAIL\n", i + 1); ret = -EINVAL; } + if (hsiphash(in, i, &test_key_hsiphash) != + test_vectors_hsiphash[i]) { + pr_info("hsiphash self-test aligned %u: FAIL\n", i + 1); + ret = -EINVAL; + } + if (hsiphash(in_unaligned + 1, i, &test_key_hsiphash) != + test_vectors_hsiphash[i]) { + pr_info("hsiphash self-test unaligned %u: FAIL\n", i + 1); + ret = -EINVAL; + } } if (siphash_1u64(0x0706050403020100ULL, &test_key_siphash) != test_vectors_siphash[8]) { @@ -115,6 +185,28 @@ static int __init siphash_test_init(void) pr_info("siphash self-test 4u32: FAIL\n"); ret = -EINVAL; } + if (hsiphash_1u32(0x03020100U, &test_key_hsiphash) != + test_vectors_hsiphash[4]) { + pr_info("hsiphash self-test 1u32: FAIL\n"); + ret = -EINVAL; + } + if (hsiphash_2u32(0x03020100U, 0x07060504U, &test_key_hsiphash) != + test_vectors_hsiphash[8]) { + pr_info("hsiphash self-test 2u32: FAIL\n"); + ret = -EINVAL; + } + if (hsiphash_3u32(0x03020100U, 0x07060504U, + 0x0b0a0908U, &test_key_hsiphash) != + test_vectors_hsiphash[12]) { + pr_info("hsiphash self-test 3u32: FAIL\n"); + ret = -EINVAL; + } + if (hsiphash_4u32(0x03020100U, 0x07060504U, + 0x0b0a0908U, 0x0f0e0d0cU, &test_key_hsiphash) != + test_vectors_hsiphash[16]) { + pr_info("hsiphash self-test 4u32: FAIL\n"); + ret = -EINVAL; + } if (!ret) pr_info("self-tests: pass\n"); return ret;
commit df453700e8d81b1bdafdf684365ee2b9431fb702 upstream.
According to Amit Klein and Benny Pinkas, IP ID generation is too weak and might be used by attackers.
Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix()) having 64bit key and Jenkins hash is risky.
It is time to switch to siphash and its 128bit keys.
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Amit Klein aksecurity@gmail.com Reported-by: Benny Pinkas benny@pinkas.net Signed-off-by: David S. Miller davem@davemloft.net [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/siphash.h | 5 +++++ include/net/netns/ipv4.h | 2 ++ net/ipv4/route.c | 12 +++++++----- net/ipv6/output_core.c | 30 ++++++++++++++++-------------- 4 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/include/linux/siphash.h b/include/linux/siphash.h index fa7a6b9cedbff..bf21591a9e5e6 100644 --- a/include/linux/siphash.h +++ b/include/linux/siphash.h @@ -21,6 +21,11 @@ typedef struct { u64 key[2]; } siphash_key_t;
+static inline bool siphash_key_is_zero(const siphash_key_t *key) +{ + return !(key->key[0] | key->key[1]); +} + u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key); #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key); diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 61c38f87ea079..e6f49f22e0066 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -8,6 +8,7 @@ #include <linux/uidgid.h> #include <net/inet_frag.h> #include <linux/rcupdate.h> +#include <linux/siphash.h>
struct tcpm_hash_bucket; struct ctl_table_header; @@ -109,5 +110,6 @@ struct netns_ipv4 { #endif #endif atomic_t rt_genid; + siphash_key_t ip_id_key; }; #endif diff --git a/net/ipv4/route.c b/net/ipv4/route.c index a58effba760ae..3c605a788ba1e 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -490,15 +490,17 @@ EXPORT_SYMBOL(ip_idents_reserve);
void __ip_select_ident(struct net *net, struct iphdr *iph, int segs) { - static u32 ip_idents_hashrnd __read_mostly; u32 hash, id;
- net_get_random_once(&ip_idents_hashrnd, sizeof(ip_idents_hashrnd)); + /* Note the following code is not safe, but this is okay. */ + if (unlikely(siphash_key_is_zero(&net->ipv4.ip_id_key))) + get_random_bytes(&net->ipv4.ip_id_key, + sizeof(net->ipv4.ip_id_key));
- hash = jhash_3words((__force u32)iph->daddr, + hash = siphash_3u32((__force u32)iph->daddr, (__force u32)iph->saddr, - iph->protocol ^ net_hash_mix(net), - ip_idents_hashrnd); + iph->protocol, + &net->ipv4.ip_id_key); id = ip_idents_reserve(hash, segs); iph->id = htons(id); } diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c index f99a04674419b..6b896cc9604e5 100644 --- a/net/ipv6/output_core.c +++ b/net/ipv6/output_core.c @@ -10,15 +10,25 @@ #include <net/secure_seq.h> #include <linux/netfilter.h>
-static u32 __ipv6_select_ident(struct net *net, u32 hashrnd, +static u32 __ipv6_select_ident(struct net *net, const struct in6_addr *dst, const struct in6_addr *src) { + const struct { + struct in6_addr dst; + struct in6_addr src; + } __aligned(SIPHASH_ALIGNMENT) combined = { + .dst = *dst, + .src = *src, + }; u32 hash, id;
- hash = __ipv6_addr_jhash(dst, hashrnd); - hash = __ipv6_addr_jhash(src, hash); - hash ^= net_hash_mix(net); + /* Note the following code is not safe, but this is okay. */ + if (unlikely(siphash_key_is_zero(&net->ipv4.ip_id_key))) + get_random_bytes(&net->ipv4.ip_id_key, + sizeof(net->ipv4.ip_id_key)); + + hash = siphash(&combined, sizeof(combined), &net->ipv4.ip_id_key);
/* Treat id of 0 as unset and if we get 0 back from ip_idents_reserve, * set the hight order instead thus minimizing possible future @@ -41,7 +51,6 @@ static u32 __ipv6_select_ident(struct net *net, u32 hashrnd, */ void ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb) { - static u32 ip6_proxy_idents_hashrnd __read_mostly; struct in6_addr buf[2]; struct in6_addr *addrs; u32 id; @@ -53,11 +62,7 @@ void ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb) if (!addrs) return;
- net_get_random_once(&ip6_proxy_idents_hashrnd, - sizeof(ip6_proxy_idents_hashrnd)); - - id = __ipv6_select_ident(net, ip6_proxy_idents_hashrnd, - &addrs[1], &addrs[0]); + id = __ipv6_select_ident(net, &addrs[1], &addrs[0]); skb_shinfo(skb)->ip6_frag_id = htonl(id); } EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident); @@ -66,12 +71,9 @@ __be32 ipv6_select_ident(struct net *net, const struct in6_addr *daddr, const struct in6_addr *saddr) { - static u32 ip6_idents_hashrnd __read_mostly; u32 id;
- net_get_random_once(&ip6_idents_hashrnd, sizeof(ip6_idents_hashrnd)); - - id = __ipv6_select_ident(net, ip6_idents_hashrnd, daddr, saddr); + id = __ipv6_select_ident(net, daddr, saddr); return htonl(id); } EXPORT_SYMBOL(ipv6_select_ident);
commit 3c79107631db1f7fd32cf3f7368e4672004a3010 upstream.
else, we leak the addresses to userspace via ctnetlink events and dumps.
Compute an ID on demand based on the immutable parts of nf_conn struct.
Another advantage compared to using an address is that there is no immediate re-use of the same ID in case the conntrack entry is freed and reallocated again immediately.
Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID") Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_conntrack.h | 2 ++ net/netfilter/nf_conntrack_core.c | 35 ++++++++++++++++++++++++++++ net/netfilter/nf_conntrack_netlink.c | 34 +++++++++++++++++++++++---- 3 files changed, 66 insertions(+), 5 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index fde4068eec0b2..636e9e11bd5f6 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -297,6 +297,8 @@ struct nf_conn *nf_ct_tmpl_alloc(struct net *net, gfp_t flags); void nf_ct_tmpl_free(struct nf_conn *tmpl);
+u32 nf_ct_get_id(const struct nf_conn *ct); + #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 5f747089024fa..fd301fb137194 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -23,6 +23,7 @@ #include <linux/slab.h> #include <linux/random.h> #include <linux/jhash.h> +#include <linux/siphash.h> #include <linux/err.h> #include <linux/percpu.h> #include <linux/moduleparam.h> @@ -234,6 +235,40 @@ nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse, } EXPORT_SYMBOL_GPL(nf_ct_invert_tuple);
+/* Generate a almost-unique pseudo-id for a given conntrack. + * + * intentionally doesn't re-use any of the seeds used for hash + * table location, we assume id gets exposed to userspace. + * + * Following nf_conn items do not change throughout lifetime + * of the nf_conn after it has been committed to main hash table: + * + * 1. nf_conn address + * 2. nf_conn->ext address + * 3. nf_conn->master address (normally NULL) + * 4. tuple + * 5. the associated net namespace + */ +u32 nf_ct_get_id(const struct nf_conn *ct) +{ + static __read_mostly siphash_key_t ct_id_seed; + unsigned long a, b, c, d; + + net_get_random_once(&ct_id_seed, sizeof(ct_id_seed)); + + a = (unsigned long)ct; + b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); + c = (unsigned long)ct->ext; + d = (unsigned long)siphash(&ct->tuplehash, sizeof(ct->tuplehash), + &ct_id_seed); +#ifdef CONFIG_64BIT + return siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &ct_id_seed); +#else + return siphash_4u32((u32)a, (u32)b, (u32)c, (u32)d, &ct_id_seed); +#endif +} +EXPORT_SYMBOL_GPL(nf_ct_get_id); + static void clean_from_lists(struct nf_conn *ct) { diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index c68e020427ab9..3a24c01cb9090 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -29,6 +29,7 @@ #include <linux/spinlock.h> #include <linux/interrupt.h> #include <linux/slab.h> +#include <linux/siphash.h>
#include <linux/netfilter.h> #include <net/netlink.h> @@ -451,7 +452,9 @@ ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, const struct nf_conn *ct) static inline int ctnetlink_dump_id(struct sk_buff *skb, const struct nf_conn *ct) { - if (nla_put_be32(skb, CTA_ID, htonl((unsigned long)ct))) + __be32 id = (__force __be32)nf_ct_get_id(ct); + + if (nla_put_be32(skb, CTA_ID, id)) goto nla_put_failure; return 0;
@@ -1159,8 +1162,9 @@ ctnetlink_del_conntrack(struct sock *ctnl, struct sk_buff *skb, ct = nf_ct_tuplehash_to_ctrack(h);
if (cda[CTA_ID]) { - u_int32_t id = ntohl(nla_get_be32(cda[CTA_ID])); - if (id != (u32)(unsigned long)ct) { + __be32 id = nla_get_be32(cda[CTA_ID]); + + if (id != (__force __be32)nf_ct_get_id(ct)) { nf_ct_put(ct); return -ENOENT; } @@ -2480,6 +2484,25 @@ nla_put_failure:
static const union nf_inet_addr any_addr;
+static __be32 nf_expect_get_id(const struct nf_conntrack_expect *exp) +{ + static __read_mostly siphash_key_t exp_id_seed; + unsigned long a, b, c, d; + + net_get_random_once(&exp_id_seed, sizeof(exp_id_seed)); + + a = (unsigned long)exp; + b = (unsigned long)exp->helper; + c = (unsigned long)exp->master; + d = (unsigned long)siphash(&exp->tuple, sizeof(exp->tuple), &exp_id_seed); + +#ifdef CONFIG_64BIT + return (__force __be32)siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &exp_id_seed); +#else + return (__force __be32)siphash_4u32((u32)a, (u32)b, (u32)c, (u32)d, &exp_id_seed); +#endif +} + static int ctnetlink_exp_dump_expect(struct sk_buff *skb, const struct nf_conntrack_expect *exp) @@ -2527,7 +2550,7 @@ ctnetlink_exp_dump_expect(struct sk_buff *skb, } #endif if (nla_put_be32(skb, CTA_EXPECT_TIMEOUT, htonl(timeout)) || - nla_put_be32(skb, CTA_EXPECT_ID, htonl((unsigned long)exp)) || + nla_put_be32(skb, CTA_EXPECT_ID, nf_expect_get_id(exp)) || nla_put_be32(skb, CTA_EXPECT_FLAGS, htonl(exp->flags)) || nla_put_be32(skb, CTA_EXPECT_CLASS, htonl(exp->class))) goto nla_put_failure; @@ -2824,7 +2847,8 @@ ctnetlink_get_expect(struct sock *ctnl, struct sk_buff *skb,
if (cda[CTA_EXPECT_ID]) { __be32 id = nla_get_be32(cda[CTA_EXPECT_ID]); - if (ntohl(id) != (u32)(unsigned long)exp) { + + if (id != nf_expect_get_id(exp)) { nf_ct_expect_put(exp); return -ENOENT; }
commit 656c8e9cc1badbc18eefe6ba01d33ebbcae61b9a upstream.
Change ct id hash calculation to only use invariants.
Currently the ct id hash calculation is based on some fields that can change in the lifetime on a conntrack entry in some corner cases. The current hash uses the whole tuple which contains an hlist pointer which will change when the conntrack is placed on the dying list resulting in a ct id change.
This patch also removes the reply-side tuple and extension pointer from the hash calculation so that the ct id will will not change from initialization until confirmation.
Fixes: 3c79107631db1f7 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id") Signed-off-by: Dirk Morris dmorris@metaloft.com Acked-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_core.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index fd301fb137194..de0aad12b91d2 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -241,13 +241,12 @@ EXPORT_SYMBOL_GPL(nf_ct_invert_tuple); * table location, we assume id gets exposed to userspace. * * Following nf_conn items do not change throughout lifetime - * of the nf_conn after it has been committed to main hash table: + * of the nf_conn: * * 1. nf_conn address - * 2. nf_conn->ext address - * 3. nf_conn->master address (normally NULL) - * 4. tuple - * 5. the associated net namespace + * 2. nf_conn->master address (normally NULL) + * 3. the associated net namespace + * 4. the original direction tuple */ u32 nf_ct_get_id(const struct nf_conn *ct) { @@ -257,9 +256,10 @@ u32 nf_ct_get_id(const struct nf_conn *ct) net_get_random_once(&ct_id_seed, sizeof(ct_id_seed));
a = (unsigned long)ct; - b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct)); - c = (unsigned long)ct->ext; - d = (unsigned long)siphash(&ct->tuplehash, sizeof(ct->tuplehash), + b = (unsigned long)ct->master; + c = (unsigned long)nf_ct_net(ct); + d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, + sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple), &ct_id_seed); #ifdef CONFIG_64BIT return siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &ct_id_seed);
This reverts commit 5f18429ae48faebefc00533cb24afdd01064754c.
Which was upstream commit 53fe307dfd309e425b171f6272d64296a54f4dff.
Ben Hutchings reports that this commit depends on new code added in v4.18, and so is irrelevant on older kernels, and breaks the build.
Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/tests/parse-events.c | 27 --------------------------- 1 file changed, 27 deletions(-)
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c index 1a35ab044c11d..54af2f2e2ee4f 100644 --- a/tools/perf/tests/parse-events.c +++ b/tools/perf/tests/parse-events.c @@ -12,32 +12,6 @@ #define PERF_TP_SAMPLE_TYPE (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | \ PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD)
-#if defined(__s390x__) -/* Return true if kvm module is available and loaded. Test this - * and retun success when trace point kvm_s390_create_vm - * exists. Otherwise this test always fails. - */ -static bool kvm_s390_create_vm_valid(void) -{ - char *eventfile; - bool rc = false; - - eventfile = get_events_file("kvm-s390"); - - if (eventfile) { - DIR *mydir = opendir(eventfile); - - if (mydir) { - rc = true; - closedir(mydir); - } - put_events_file(eventfile); - } - - return rc; -} -#endif - static int test__checkevent_tracepoint(struct perf_evlist *evlist) { struct perf_evsel *evsel = perf_evlist__first(evlist); @@ -1587,7 +1561,6 @@ static struct evlist_test test__events[] = { { .name = "kvm-s390:kvm_s390_create_vm", .check = test__checkevent_tracepoint, - .valid = kvm_s390_create_vm_valid, .id = 100, }, #endif
[ Upstream commit 7a9c2dd08eadd5c6943115dbbec040c38d2e0822 ]
A bug was reported that on certain Broadwell platforms, after resuming from S3, the CPU is running at an anomalously low speed.
It turns out that the BIOS has modified the value of the THERM_CONTROL register during S3, and changed it from 0 to 0x10, thus enabled clock modulation(bit4), but with undefined CPU Duty Cycle(bit1:3) - which causes the problem.
Here is a simple scenario to reproduce the issue:
1. Boot up the system 2. Get MSR 0x19a, it should be 0 3. Put the system into sleep, then wake it up 4. Get MSR 0x19a, it shows 0x10, while it should be 0
Although some BIOSen want to change the CPU Duty Cycle during S3, in our case we don't want the BIOS to do any modification.
Fix this issue by introducing a more generic x86 framework to save/restore specified MSR registers(THERM_CONTROL in this case) for suspend/resume. This allows us to fix similar bugs in a much simpler way in the future.
When the kernel wants to protect certain MSRs during suspending, we simply add a quirk entry in msr_save_dmi_table, and customize the MSR registers inside the quirk callback, for example:
u32 msr_id_need_to_save[] = {MSR_ID0, MSR_ID1, MSR_ID2...};
and the quirk mechanism ensures that, once resumed from suspend, the MSRs indicated by these IDs will be restored to their original, pre-suspend values.
Since both 64-bit and 32-bit kernels are affected, this patch covers the common 64/32-bit suspend/resume code path. And because the MSRs specified by the user might not be available or readable in any situation, we use rdmsrl_safe() to safely save these MSRs.
Reported-and-tested-by: Marcin Kaszewski marcin.kaszewski@intel.com Signed-off-by: Chen Yu yu.c.chen@intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Pavel Machek pavel@ucw.cz Cc: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: H. Peter Anvin hpa@zytor.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: bp@suse.de Cc: len.brown@intel.com Cc: linux@horizon.com Cc: luto@kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/c9abdcbc173dd2f57e8990e304376f19287e92ba.1448382971... [ More edits to the naming of data structures. ] Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/msr.h | 10 ++++ arch/x86/include/asm/suspend_32.h | 1 + arch/x86/include/asm/suspend_64.h | 1 + arch/x86/power/cpu.c | 92 +++++++++++++++++++++++++++++++ 4 files changed, 104 insertions(+)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index 5a10ac8c131ea..20f822fec8af0 100644 --- a/arch/x86/include/asm/msr.h +++ b/arch/x86/include/asm/msr.h @@ -32,6 +32,16 @@ struct msr_regs_info { int err; };
+struct saved_msr { + bool valid; + struct msr_info info; +}; + +struct saved_msrs { + unsigned int num; + struct saved_msr *array; +}; + static inline unsigned long long native_read_tscp(unsigned int *aux) { unsigned long low, high; diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h index d1793f06854d2..8e9dbe7b73a1f 100644 --- a/arch/x86/include/asm/suspend_32.h +++ b/arch/x86/include/asm/suspend_32.h @@ -15,6 +15,7 @@ struct saved_context { unsigned long cr0, cr2, cr3, cr4; u64 misc_enable; bool misc_enable_saved; + struct saved_msrs saved_msrs; struct desc_ptr gdt_desc; struct desc_ptr idt; u16 ldt; diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h index 7ebf0ebe4e687..6136a18152af2 100644 --- a/arch/x86/include/asm/suspend_64.h +++ b/arch/x86/include/asm/suspend_64.h @@ -24,6 +24,7 @@ struct saved_context { unsigned long cr0, cr2, cr3, cr4, cr8; u64 misc_enable; bool misc_enable_saved; + struct saved_msrs saved_msrs; unsigned long efer; u16 gdt_pad; /* Unused */ struct desc_ptr gdt_desc; diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 9ab52791fed59..d5f64996394a9 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -23,6 +23,7 @@ #include <asm/debugreg.h> #include <asm/cpu.h> #include <asm/mmu_context.h> +#include <linux/dmi.h>
#ifdef CONFIG_X86_32 __visible unsigned long saved_context_ebx; @@ -32,6 +33,29 @@ __visible unsigned long saved_context_eflags; #endif struct saved_context saved_context;
+static void msr_save_context(struct saved_context *ctxt) +{ + struct saved_msr *msr = ctxt->saved_msrs.array; + struct saved_msr *end = msr + ctxt->saved_msrs.num; + + while (msr < end) { + msr->valid = !rdmsrl_safe(msr->info.msr_no, &msr->info.reg.q); + msr++; + } +} + +static void msr_restore_context(struct saved_context *ctxt) +{ + struct saved_msr *msr = ctxt->saved_msrs.array; + struct saved_msr *end = msr + ctxt->saved_msrs.num; + + while (msr < end) { + if (msr->valid) + wrmsrl(msr->info.msr_no, msr->info.reg.q); + msr++; + } +} + /** * __save_processor_state - save CPU registers before creating a * hibernation image and before restoring the memory state from it @@ -111,6 +135,7 @@ static void __save_processor_state(struct saved_context *ctxt) #endif ctxt->misc_enable_saved = !rdmsrl_safe(MSR_IA32_MISC_ENABLE, &ctxt->misc_enable); + msr_save_context(ctxt); }
/* Needed by apm.c */ @@ -229,6 +254,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) x86_platform.restore_sched_clock_state(); mtrr_bp_restore(); perf_restore_debug_store(); + msr_restore_context(ctxt); }
/* Needed by apm.c */ @@ -320,3 +346,69 @@ static int __init bsp_pm_check_init(void) }
core_initcall(bsp_pm_check_init); + +static int msr_init_context(const u32 *msr_id, const int total_num) +{ + int i = 0; + struct saved_msr *msr_array; + + if (saved_context.saved_msrs.array || saved_context.saved_msrs.num > 0) { + pr_err("x86/pm: MSR quirk already applied, please check your DMI match table.\n"); + return -EINVAL; + } + + msr_array = kmalloc_array(total_num, sizeof(struct saved_msr), GFP_KERNEL); + if (!msr_array) { + pr_err("x86/pm: Can not allocate memory to save/restore MSRs during suspend.\n"); + return -ENOMEM; + } + + for (i = 0; i < total_num; i++) { + msr_array[i].info.msr_no = msr_id[i]; + msr_array[i].valid = false; + msr_array[i].info.reg.q = 0; + } + saved_context.saved_msrs.num = total_num; + saved_context.saved_msrs.array = msr_array; + + return 0; +} + +/* + * The following section is a quirk framework for problematic BIOSen: + * Sometimes MSRs are modified by the BIOSen after suspended to + * RAM, this might cause unexpected behavior after wakeup. + * Thus we save/restore these specified MSRs across suspend/resume + * in order to work around it. + * + * For any further problematic BIOSen/platforms, + * please add your own function similar to msr_initialize_bdw. + */ +static int msr_initialize_bdw(const struct dmi_system_id *d) +{ + /* Add any extra MSR ids into this array. */ + u32 bdw_msr_id[] = { MSR_IA32_THERM_CONTROL }; + + pr_info("x86/pm: %s detected, MSR saving is needed during suspending.\n", d->ident); + return msr_init_context(bdw_msr_id, ARRAY_SIZE(bdw_msr_id)); +} + +static struct dmi_system_id msr_save_dmi_table[] = { + { + .callback = msr_initialize_bdw, + .ident = "BROADWELL BDX_EP", + .matches = { + DMI_MATCH(DMI_PRODUCT_NAME, "GRANTLEY"), + DMI_MATCH(DMI_PRODUCT_VERSION, "E63448-400"), + }, + }, + {} +}; + +static int pm_check_save_msr(void) +{ + dmi_check_system(msr_save_dmi_table); + return 0; +} + +device_initcall(pm_check_save_msr);
[ Upstream commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24 ]
There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. This issue stems from a BIOS not performing the proper steps during resume to ensure RDRAND continues to function properly.
RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported.
Update the CPU initialization to clear the RDRAND CPUID bit for any family 15h and 16h processor that supports RDRAND. If it is known that the family 15h or family 16h system does not have an RDRAND resume issue or that the system will not be placed in suspend, the "rdrand=force" kernel parameter can be used to stop the clearing of the RDRAND CPUID bit.
Additionally, update the suspend and resume path to save and restore the MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in place after resuming from suspend.
Note, that clearing the RDRAND CPUID bit does not prevent a processor that normally supports the RDRAND instruction from executing it. So any code that determined the support based on family and model won't #UD.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Borislav Petkov bp@suse.de Cc: Andrew Cooper andrew.cooper3@citrix.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Chen Yu yu.c.chen@intel.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Juergen Gross jgross@suse.com Cc: Kees Cook keescook@chromium.org Cc: "linux-doc@vger.kernel.org" linux-doc@vger.kernel.org Cc: "linux-pm@vger.kernel.org" linux-pm@vger.kernel.org Cc: Nathan Chancellor natechancellor@gmail.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Pavel Machek pavel@ucw.cz Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: stable@vger.kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: "x86@kernel.org" x86@kernel.org Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.156622994... [sl: adjust context in docs] Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/kernel-parameters.txt | 7 +++ arch/x86/include/asm/msr-index.h | 1 + arch/x86/kernel/cpu/amd.c | 66 ++++++++++++++++++++++ arch/x86/power/cpu.c | 86 ++++++++++++++++++++++++----- 4 files changed, 147 insertions(+), 13 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7a9fd54a0186a..5b94c0bfba859 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3415,6 +3415,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Run specified binary instead of /init from the ramdisk, used for early userspace startup. See initrd.
+ rdrand= [X86] + force - Override the decision by the kernel to hide the + advertisement of RDRAND support (this affects + certain AMD processors because of buggy BIOS + support, specifically around the suspend/resume + path). + reboot= [KNL] Format (x86 or x86_64): [w[arm] | c[old] | h[ard] | s[oft] | g[pio]] \ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index d4f5b8209393f..30183770132a0 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -311,6 +311,7 @@ #define MSR_AMD64_PATCH_LEVEL 0x0000008b #define MSR_AMD64_TSC_RATIO 0xc0000104 #define MSR_AMD64_NB_CFG 0xc001001f +#define MSR_AMD64_CPUID_FN_1 0xc0011004 #define MSR_AMD64_PATCH_LOADER 0xc0010020 #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140 #define MSR_AMD64_OSVW_STATUS 0xc0010141 diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 6f2483292de0b..424d8a636615a 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -684,6 +684,64 @@ static void init_amd_ln(struct cpuinfo_x86 *c) msr_set_bit(MSR_AMD64_DE_CFG, 31); }
+static bool rdrand_force; + +static int __init rdrand_cmdline(char *str) +{ + if (!str) + return -EINVAL; + + if (!strcmp(str, "force")) + rdrand_force = true; + else + return -EINVAL; + + return 0; +} +early_param("rdrand", rdrand_cmdline); + +static void clear_rdrand_cpuid_bit(struct cpuinfo_x86 *c) +{ + /* + * Saving of the MSR used to hide the RDRAND support during + * suspend/resume is done by arch/x86/power/cpu.c, which is + * dependent on CONFIG_PM_SLEEP. + */ + if (!IS_ENABLED(CONFIG_PM_SLEEP)) + return; + + /* + * The nordrand option can clear X86_FEATURE_RDRAND, so check for + * RDRAND support using the CPUID function directly. + */ + if (!(cpuid_ecx(1) & BIT(30)) || rdrand_force) + return; + + msr_clear_bit(MSR_AMD64_CPUID_FN_1, 62); + + /* + * Verify that the CPUID change has occurred in case the kernel is + * running virtualized and the hypervisor doesn't support the MSR. + */ + if (cpuid_ecx(1) & BIT(30)) { + pr_info_once("BIOS may not properly restore RDRAND after suspend, but hypervisor does not support hiding RDRAND via CPUID.\n"); + return; + } + + clear_cpu_cap(c, X86_FEATURE_RDRAND); + pr_info_once("BIOS may not properly restore RDRAND after suspend, hiding RDRAND via CPUID. Use rdrand=force to reenable.\n"); +} + +static void init_amd_jg(struct cpuinfo_x86 *c) +{ + /* + * Some BIOS implementations do not restore proper RDRAND support + * across suspend and resume. Check on whether to hide the RDRAND + * instruction support via CPUID. + */ + clear_rdrand_cpuid_bit(c); +} + static void init_amd_bd(struct cpuinfo_x86 *c) { u64 value; @@ -711,6 +769,13 @@ static void init_amd_bd(struct cpuinfo_x86 *c) wrmsrl_safe(0xc0011021, value); } } + + /* + * Some BIOS implementations do not restore proper RDRAND support + * across suspend and resume. Check on whether to hide the RDRAND + * instruction support via CPUID. + */ + clear_rdrand_cpuid_bit(c); }
static void init_amd_zn(struct cpuinfo_x86 *c) @@ -755,6 +820,7 @@ static void init_amd(struct cpuinfo_x86 *c) case 0x10: init_amd_gh(c); break; case 0x12: init_amd_ln(c); break; case 0x15: init_amd_bd(c); break; + case 0x16: init_amd_jg(c); break; case 0x17: init_amd_zn(c); break; }
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index d5f64996394a9..2e5052b2d2382 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -12,6 +12,7 @@ #include <linux/export.h> #include <linux/smp.h> #include <linux/perf_event.h> +#include <linux/dmi.h>
#include <asm/pgtable.h> #include <asm/proto.h> @@ -23,7 +24,7 @@ #include <asm/debugreg.h> #include <asm/cpu.h> #include <asm/mmu_context.h> -#include <linux/dmi.h> +#include <asm/cpu_device_id.h>
#ifdef CONFIG_X86_32 __visible unsigned long saved_context_ebx; @@ -347,15 +348,14 @@ static int __init bsp_pm_check_init(void)
core_initcall(bsp_pm_check_init);
-static int msr_init_context(const u32 *msr_id, const int total_num) +static int msr_build_context(const u32 *msr_id, const int num) { - int i = 0; + struct saved_msrs *saved_msrs = &saved_context.saved_msrs; struct saved_msr *msr_array; + int total_num; + int i, j;
- if (saved_context.saved_msrs.array || saved_context.saved_msrs.num > 0) { - pr_err("x86/pm: MSR quirk already applied, please check your DMI match table.\n"); - return -EINVAL; - } + total_num = saved_msrs->num + num;
msr_array = kmalloc_array(total_num, sizeof(struct saved_msr), GFP_KERNEL); if (!msr_array) { @@ -363,19 +363,30 @@ static int msr_init_context(const u32 *msr_id, const int total_num) return -ENOMEM; }
- for (i = 0; i < total_num; i++) { - msr_array[i].info.msr_no = msr_id[i]; + if (saved_msrs->array) { + /* + * Multiple callbacks can invoke this function, so copy any + * MSR save requests from previous invocations. + */ + memcpy(msr_array, saved_msrs->array, + sizeof(struct saved_msr) * saved_msrs->num); + + kfree(saved_msrs->array); + } + + for (i = saved_msrs->num, j = 0; i < total_num; i++, j++) { + msr_array[i].info.msr_no = msr_id[j]; msr_array[i].valid = false; msr_array[i].info.reg.q = 0; } - saved_context.saved_msrs.num = total_num; - saved_context.saved_msrs.array = msr_array; + saved_msrs->num = total_num; + saved_msrs->array = msr_array;
return 0; }
/* - * The following section is a quirk framework for problematic BIOSen: + * The following sections are a quirk framework for problematic BIOSen: * Sometimes MSRs are modified by the BIOSen after suspended to * RAM, this might cause unexpected behavior after wakeup. * Thus we save/restore these specified MSRs across suspend/resume @@ -390,7 +401,7 @@ static int msr_initialize_bdw(const struct dmi_system_id *d) u32 bdw_msr_id[] = { MSR_IA32_THERM_CONTROL };
pr_info("x86/pm: %s detected, MSR saving is needed during suspending.\n", d->ident); - return msr_init_context(bdw_msr_id, ARRAY_SIZE(bdw_msr_id)); + return msr_build_context(bdw_msr_id, ARRAY_SIZE(bdw_msr_id)); }
static struct dmi_system_id msr_save_dmi_table[] = { @@ -405,9 +416,58 @@ static struct dmi_system_id msr_save_dmi_table[] = { {} };
+static int msr_save_cpuid_features(const struct x86_cpu_id *c) +{ + u32 cpuid_msr_id[] = { + MSR_AMD64_CPUID_FN_1, + }; + + pr_info("x86/pm: family %#hx cpu detected, MSR saving is needed during suspending.\n", + c->family); + + return msr_build_context(cpuid_msr_id, ARRAY_SIZE(cpuid_msr_id)); +} + +static const struct x86_cpu_id msr_save_cpu_table[] = { + { + .vendor = X86_VENDOR_AMD, + .family = 0x15, + .model = X86_MODEL_ANY, + .feature = X86_FEATURE_ANY, + .driver_data = (kernel_ulong_t)msr_save_cpuid_features, + }, + { + .vendor = X86_VENDOR_AMD, + .family = 0x16, + .model = X86_MODEL_ANY, + .feature = X86_FEATURE_ANY, + .driver_data = (kernel_ulong_t)msr_save_cpuid_features, + }, + {} +}; + +typedef int (*pm_cpu_match_t)(const struct x86_cpu_id *); +static int pm_cpu_check(const struct x86_cpu_id *c) +{ + const struct x86_cpu_id *m; + int ret = 0; + + m = x86_match_cpu(msr_save_cpu_table); + if (m) { + pm_cpu_match_t fn; + + fn = (pm_cpu_match_t)m->driver_data; + ret = fn(m); + } + + return ret; +} + static int pm_check_save_msr(void) { dmi_check_system(msr_save_dmi_table); + pm_cpu_check(msr_save_cpu_table); + return 0; }
[ Upstream commit 7c7cfdcf7f1777c7376fc9a239980de04b6b5ea1 ]
Fix the following BUG:
[ 187.065689] BUG: kernel NULL pointer dereference, address: 000000000000001c [ 187.065790] RIP: 0010:ufshcd_vreg_set_hpm+0x3c/0x110 [ufshcd_core] [ 187.065938] Call Trace: [ 187.065959] ufshcd_resume+0x72/0x290 [ufshcd_core] [ 187.065980] ufshcd_system_resume+0x54/0x140 [ufshcd_core] [ 187.065993] ? pci_pm_restore+0xb0/0xb0 [ 187.066005] ufshcd_pci_resume+0x15/0x20 [ufshcd_pci] [ 187.066017] pci_pm_thaw+0x4c/0x90 [ 187.066030] dpm_run_callback+0x5b/0x150 [ 187.066043] device_resume+0x11b/0x220
Voltage regulators are optional, so functions must check they exist before dereferencing.
Note this issue is hidden if CONFIG_REGULATORS is not set, because the offending code is optimised away.
Notes for stable:
The issue first appears in commit 57d104c153d3 ("ufs: add UFS power management support") but is inadvertently fixed in commit 60f0187031c0 ("scsi: ufs: disable vccq if it's not needed by UFS device") which in turn was reverted by commit 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device""). So fix applies v3.18 to v4.5 and v5.1+
Fixes: 57d104c153d3 ("ufs: add UFS power management support") Fixes: 730679817d83 ("Revert "scsi: ufs: disable vccq if it's not needed by UFS device"") Cc: stable@vger.kernel.org Signed-off-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/ufs/ufshcd.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index b140e81c4f7da..fd8bbd2b5d0eb 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -4418,6 +4418,9 @@ static inline int ufshcd_config_vreg_lpm(struct ufs_hba *hba, static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba, struct ufs_vreg *vreg) { + if (!vreg) + return 0; + return ufshcd_config_vreg_load(hba->dev, vreg, vreg->max_uA); }
[ Upstream commit 5d6fb560729a5d5554e23db8d00eb57cd0021083 ]
clang-9 points out that there are two variables that depending on the configuration may only be used in an ARRAY_SIZE() expression but not referenced:
drivers/dma/ste_dma40.c:145:12: error: variable 'd40_backup_regs' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] static u32 d40_backup_regs[] = { ^ drivers/dma/ste_dma40.c:214:12: error: variable 'd40_backup_regs_chan' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] static u32 d40_backup_regs_chan[] = {
Mark these __maybe_unused to shut up the warning.
Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20190712091357.744515-1-arnd@arndb.de Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/ste_dma40.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/ste_dma40.c b/drivers/dma/ste_dma40.c index dd3e7ba273ad0..0fede051f4e1c 100644 --- a/drivers/dma/ste_dma40.c +++ b/drivers/dma/ste_dma40.c @@ -142,7 +142,7 @@ enum d40_events { * when the DMA hw is powered off. * TODO: Add save/restore of D40_DREG_GCC on dma40 v3 or later, if that works. */ -static u32 d40_backup_regs[] = { +static __maybe_unused u32 d40_backup_regs[] = { D40_DREG_LCPA, D40_DREG_LCLA, D40_DREG_PRMSE, @@ -211,7 +211,7 @@ static u32 d40_backup_regs_v4b[] = {
#define BACKUP_REGS_SZ_V4B ARRAY_SIZE(d40_backup_regs_v4b)
-static u32 d40_backup_regs_chan[] = { +static __maybe_unused u32 d40_backup_regs_chan[] = { D40_CHAN_REG_SSCFG, D40_CHAN_REG_SSELT, D40_CHAN_REG_SSPTR,
[ Upstream commit 602fda17c7356bb7ae98467d93549057481d11dd ]
In some cases, one can get out of suspend with a reset or a disconnect followed by a reconnect. Previously we would leave a stale suspended flag set.
Signed-off-by: Benjamin Herrenschmidt benh@kernel.crashing.org Signed-off-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/composite.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c index 8bf54477f4724..351a406b97af7 100644 --- a/drivers/usb/gadget/composite.c +++ b/drivers/usb/gadget/composite.c @@ -1889,6 +1889,7 @@ void composite_disconnect(struct usb_gadget *gadget) * disconnect callbacks? */ spin_lock_irqsave(&cdev->lock, flags); + cdev->suspended = 0; if (cdev->config) reset_config(cdev); if (cdev->driver->disconnect)
[ Upstream commit 777758888ffe59ef754cc39ab2f275dc277732f4 ]
On the Gemini SoC the FOTG2 stalls after port reset so restart the HCD after each port reset.
Signed-off-by: Hans Ulli Kroll ulli.kroll@googlemail.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20190810150458.817-1-linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/host/fotg210-hcd.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/usb/host/fotg210-hcd.c b/drivers/usb/host/fotg210-hcd.c index 2341af4f34909..11b3a8c57eabc 100644 --- a/drivers/usb/host/fotg210-hcd.c +++ b/drivers/usb/host/fotg210-hcd.c @@ -1653,6 +1653,10 @@ static int fotg210_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, /* see what we found out */ temp = check_reset_complete(fotg210, wIndex, status_reg, fotg210_readl(fotg210, status_reg)); + + /* restart schedule */ + fotg210->command |= CMD_RUN; + fotg210_writel(fotg210, fotg210->command, &fotg210->regs->command); }
if (!(temp & (PORT_RESUME|PORT_RESET))) {
[ Upstream commit b0995156071b0ff29a5902964a9dc8cfad6f81c0 ]
HyperV KVP and VSS daemons should exit with 0 when the '--help' or '-h' flags are used.
Signed-off-by: Adrian Vladu avladu@cloudbasesolutions.com
Cc: "K. Y. Srinivasan" kys@microsoft.com Cc: Haiyang Zhang haiyangz@microsoft.com Cc: Stephen Hemminger sthemmin@microsoft.com Cc: Sasha Levin sashal@kernel.org Cc: Alessandro Pilotti apilotti@cloudbasesolutions.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/hv/hv_kvp_daemon.c | 2 ++ tools/hv/hv_vss_daemon.c | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c index 1774800668168..fffc7c4184599 100644 --- a/tools/hv/hv_kvp_daemon.c +++ b/tools/hv/hv_kvp_daemon.c @@ -1379,6 +1379,8 @@ int main(int argc, char *argv[]) daemonize = 0; break; case 'h': + print_usage(argv); + exit(0); default: print_usage(argv); exit(EXIT_FAILURE); diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c index 5d51d6ff08e6a..b5465f92ed50e 100644 --- a/tools/hv/hv_vss_daemon.c +++ b/tools/hv/hv_vss_daemon.c @@ -164,6 +164,8 @@ int main(int argc, char *argv[]) daemonize = 0; break; case 'h': + print_usage(argv); + exit(0); default: print_usage(argv); exit(EXIT_FAILURE);
[ Upstream commit 215e06f0d18d5d653d6ea269e4dfc684854d48bf ]
The commit 5e6acc3e678e ("bcm2835-pm: Move bcm2835-watchdog's DT probe to an MFD.") broke module autoloading on Raspberry Pi. So add a module alias this fix this.
Signed-off-by: Stefan Wahren wahrenst@gmx.net Reviewed-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/watchdog/bcm2835_wdt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/watchdog/bcm2835_wdt.c b/drivers/watchdog/bcm2835_wdt.c index 8a5ce5b5a0b6f..199b1fb3669c0 100644 --- a/drivers/watchdog/bcm2835_wdt.c +++ b/drivers/watchdog/bcm2835_wdt.c @@ -248,6 +248,7 @@ module_param(nowayout, bool, 0); MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
+MODULE_ALIAS("platform:bcm2835-wdt"); MODULE_AUTHOR("Lubomir Rintel lkundrak@v3.sk"); MODULE_DESCRIPTION("Driver for Broadcom BCM2835 watchdog timer"); MODULE_LICENSE("GPL");
Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") triggers following stack trace:
[25244.848046] kernel BUG at ./include/linux/skbuff.h:1406! [25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc [25244.888167] Call Trace: [25244.889182] <IRQ> [25244.890001] tcp_fragment+0x9c/0x2cf [25244.891295] tcp_write_xmit+0x68f/0x988 [25244.892732] __tcp_push_pending_frames+0x3b/0xa0 [25244.894347] tcp_data_snd_check+0x2a/0xc8 [25244.895775] tcp_rcv_established+0x2a8/0x30d [25244.897282] tcp_v4_do_rcv+0xb2/0x158 [25244.898666] tcp_v4_rcv+0x692/0x956 [25244.899959] ip_local_deliver_finish+0xeb/0x169 [25244.901547] __netif_receive_skb_core+0x51c/0x582 [25244.903193] ? inet_gro_receive+0x239/0x247 [25244.904756] netif_receive_skb_internal+0xab/0xc6 [25244.906395] napi_gro_receive+0x8a/0xc0 [25244.907760] receive_buf+0x9a1/0x9cd [25244.909160] ? load_balance+0x17a/0x7b7 [25244.910536] ? vring_unmap_one+0x18/0x61 [25244.911932] ? detach_buf+0x60/0xfa [25244.913234] virtnet_poll+0x128/0x1e1 [25244.914607] net_rx_action+0x12a/0x2b1 [25244.915953] __do_softirq+0x11c/0x26b [25244.917269] ? handle_irq_event+0x44/0x56 [25244.918695] irq_exit+0x61/0xa0 [25244.919947] do_IRQ+0x9d/0xbb [25244.921065] common_interrupt+0x85/0x85 [25244.922479] </IRQ>
tcp_rtx_queue_tail() (called by tcp_fragment()) can call tcp_write_queue_prev() on the first packet in the queue, which will trigger the BUG in tcp_write_queue_prev(), because there is no previous packet.
This happens when the retransmit queue is empty, for example in case of a zero window.
Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") was not a simple cherry-pick of the original one from master (b617158dc096) because there is a specific TCP rtx queue only since v4.15. For more details, please see the commit message of b617158dc096 ("tcp: be more careful in tcp_fragment()").
The BUG() is hit due to the specific code added to versions older than v4.15. The comment in skb_queue_prev() (include/linux/skbuff.h:1406), just before the BUG_ON() somehow suggests to add a check before using it, what Tim did.
In master, this code path causing the issue will not be taken because the implementation of tcp_rtx_queue_tail() is different:
tcp_fragment() → tcp_rtx_queue_tail() → tcp_write_queue_prev() → skb_queue_prev() → BUG_ON()
Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") Signed-off-by: Tim Froidcoeur tim.froidcoeur@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Reviewed-by: Christoph Paasch cpaasch@apple.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tcp.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/net/tcp.h b/include/net/tcp.h index 0410fd29d5695..4447195a0cd4f 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1540,6 +1540,10 @@ static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk);
+ /* empty retransmit queue, for example due to zero window */ + if (skb == tcp_write_queue_head(sk)) + return NULL; + return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); }
From: Hui Peng benquike@gmail.com
commit 19bce474c45be69a284ecee660aa12d8f1e88f18 upstream.
`check_input_term` recursively calls itself with input from device side (e.g., uac_input_terminal_descriptor.bCSourceID) as argument (id). In `check_input_term`, if `check_input_term` is called with the same `id` argument as the caller, it triggers endless recursive call, resulting kernel space stack overflow.
This patch fixes the bug by adding a bitmap to `struct mixer_build` to keep track of the checked ids and stop the execution if some id has been checked (similar to how parse_audio_unit handles unitid argument).
Reported-by: Hui Peng benquike@gmail.com Reported-by: Mathias Payer mathias.payer@nebelwelt.net Signed-off-by: Hui Peng benquike@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/mixer.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -81,6 +81,7 @@ struct mixer_build { unsigned char *buffer; unsigned int buflen; DECLARE_BITMAP(unitbitmap, MAX_ID_ELEMS); + DECLARE_BITMAP(termbitmap, MAX_ID_ELEMS); struct usb_audio_term oterm; const struct usbmix_name_map *map; const struct usbmix_selector_map *selector_map; @@ -709,15 +710,24 @@ static int get_term_name(struct mixer_bu * parse the source unit recursively until it reaches to a terminal * or a branched unit. */ -static int check_input_term(struct mixer_build *state, int id, +static int __check_input_term(struct mixer_build *state, int id, struct usb_audio_term *term) { int err; void *p1; + unsigned char *hdr;
memset(term, 0, sizeof(*term)); - while ((p1 = find_audio_control_unit(state, id)) != NULL) { - unsigned char *hdr = p1; + for (;;) { + /* a loop in the terminal chain? */ + if (test_and_set_bit(id, state->termbitmap)) + return -EINVAL; + + p1 = find_audio_control_unit(state, id); + if (!p1) + break; + + hdr = p1; term->id = id; switch (hdr[2]) { case UAC_INPUT_TERMINAL: @@ -732,7 +742,7 @@ static int check_input_term(struct mixer
/* call recursively to verify that the * referenced clock entity is valid */ - err = check_input_term(state, d->bCSourceID, term); + err = __check_input_term(state, d->bCSourceID, term); if (err < 0) return err;
@@ -764,7 +774,7 @@ static int check_input_term(struct mixer case UAC2_CLOCK_SELECTOR: { struct uac_selector_unit_descriptor *d = p1; /* call recursively to retrieve the channel info */ - err = check_input_term(state, d->baSourceID[0], term); + err = __check_input_term(state, d->baSourceID[0], term); if (err < 0) return err; term->type = d->bDescriptorSubtype << 16; /* virtual type */ @@ -811,6 +821,15 @@ static int check_input_term(struct mixer return -ENODEV; }
+ +static int check_input_term(struct mixer_build *state, int id, + struct usb_audio_term *term) +{ + memset(term, 0, sizeof(*term)); + memset(state->termbitmap, 0, sizeof(state->termbitmap)); + return __check_input_term(state, id, term); +} + /* * Feature Unit */
From: Hui Peng benquike@gmail.com
commit daac07156b330b18eb5071aec4b3ddca1c377f2c upstream.
The `uac_mixer_unit_descriptor` shown as below is read from the device side. In `parse_audio_mixer_unit`, `baSourceID` field is accessed from index 0 to `bNrInPins` - 1, the current implementation assumes that descriptor is always valid (the length of descriptor is no shorter than 5 + `bNrInPins`). If a descriptor read from the device side is invalid, it may trigger out-of-bound memory access.
``` struct uac_mixer_unit_descriptor { __u8 bLength; __u8 bDescriptorType; __u8 bDescriptorSubtype; __u8 bUnitID; __u8 bNrInPins; __u8 baSourceID[]; } ```
This patch fixes the bug by add a sanity check on the length of the descriptor.
Reported-by: Hui Peng benquike@gmail.com Reported-by: Mathias Payer mathias.payer@nebelwelt.net Cc: stable@vger.kernel.org Signed-off-by: Hui Peng benquike@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/mixer.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -1647,6 +1647,7 @@ static int parse_audio_mixer_unit(struct int pin, ich, err;
if (desc->bLength < 11 || !(input_pins = desc->bNrInPins) || + desc->bLength < sizeof(*desc) + desc->bNrInPins || !(num_outs = uac_mixer_unit_bNrChannels(desc))) { usb_audio_err(state->chip, "invalid MIXER UNIT descriptor %d\n",
From: Eric Dumazet edumazet@google.com
[ Upstream commit ef8d8ccdc216f797e66cb4a1372f5c4c285ce1e4 ]
As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure"), it is crucial we properly set SOCK_NOSPACE when needed.
However, Jason patch had a bug, because the 'nonblocking' status as far as sk_stream_wait_memory() is concerned is governed by MSG_DONTWAIT flag passed at sendmsg() time :
long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE cleared, if sk->sk_sndtimeo has been set to a small (but not zero) value.
This patch removes the 'noblock' variable since we must always set SOCK_NOSPACE if -EAGAIN is returned.
It also renames the do_nonblock label since we might reach this code path even if we were in blocking mode.
Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Jason Baron jbaron@akamai.com Reported-by: Vladimir Rutsky rutsky@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Jason Baron jbaron@akamai.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/stream.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
--- a/net/core/stream.c +++ b/net/core/stream.c @@ -119,7 +119,6 @@ int sk_stream_wait_memory(struct sock *s int err = 0; long vm_wait = 0; long current_timeo = *timeo_p; - bool noblock = (*timeo_p ? false : true); DEFINE_WAIT(wait);
if (sk_stream_memory_free(sk)) @@ -132,11 +131,8 @@ int sk_stream_wait_memory(struct sock *s
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) goto do_error; - if (!*timeo_p) { - if (noblock) - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); - goto do_nonblock; - } + if (!*timeo_p) + goto do_eagain; if (signal_pending(current)) goto do_interrupted; sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); @@ -168,7 +164,13 @@ out: do_error: err = -EPIPE; goto out; -do_nonblock: +do_eagain: + /* Make sure that whenever EAGAIN is returned, EPOLLOUT event can + * be generated later. + * When TCP receives ACK packets that make room, tcp_check_space() + * only calls tcp_new_space() if SOCK_NOSPACE is set. + */ + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); err = -EAGAIN; goto out; do_interrupted:
From: Takashi Iwai tiwai@suse.de
commit 75545304eba6a3d282f923b96a466dc25a81e359 upstream.
The input pool of a client might be deleted via the resize ioctl, the the access to it should be covered by the proper locks. Currently the only missing place is the call in snd_seq_ioctl_get_client_pool(), and this patch papers over it.
Reported-by: syzbot+4a75454b9ca2777f35c7@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/seq/seq_clientmgr.c | 3 +-- sound/core/seq/seq_fifo.c | 17 +++++++++++++++++ sound/core/seq/seq_fifo.h | 2 ++ 3 files changed, 20 insertions(+), 2 deletions(-)
--- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -1906,8 +1906,7 @@ static int snd_seq_ioctl_get_client_pool if (cptr->type == USER_CLIENT) { info.input_pool = cptr->data.user.fifo_pool_size; info.input_free = info.input_pool; - if (cptr->data.user.fifo) - info.input_free = snd_seq_unused_cells(cptr->data.user.fifo->pool); + info.input_free = snd_seq_fifo_unused_cells(cptr->data.user.fifo); } else { info.input_pool = 0; info.input_free = 0; --- a/sound/core/seq/seq_fifo.c +++ b/sound/core/seq/seq_fifo.c @@ -278,3 +278,20 @@ int snd_seq_fifo_resize(struct snd_seq_f
return 0; } + +/* get the number of unused cells safely */ +int snd_seq_fifo_unused_cells(struct snd_seq_fifo *f) +{ + unsigned long flags; + int cells; + + if (!f) + return 0; + + snd_use_lock_use(&f->use_lock); + spin_lock_irqsave(&f->lock, flags); + cells = snd_seq_unused_cells(f->pool); + spin_unlock_irqrestore(&f->lock, flags); + snd_use_lock_free(&f->use_lock); + return cells; +} --- a/sound/core/seq/seq_fifo.h +++ b/sound/core/seq/seq_fifo.h @@ -68,5 +68,7 @@ int snd_seq_fifo_poll_wait(struct snd_se /* resize pool in fifo */ int snd_seq_fifo_resize(struct snd_seq_fifo *f, int poolsize);
+/* get the number of unused cells safely */ +int snd_seq_fifo_unused_cells(struct snd_seq_fifo *f);
#endif
From: Sean Christopherson sean.j.christopherson@intel.com
commit 75ee23b30dc712d80d2421a9a547e7ab6e379b44 upstream.
Don't advance RIP or inject a single-step #DB if emulation signals a fault. This logic applies to all state updates that are conditional on clean retirement of the emulation instruction, e.g. updating RFLAGS was previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update EFLAGS on faulting emulation").
Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with ctxt->_eip until emulation "retires" anyways. Skipping #DB injection fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to invalid state with EFLAGS.TF=1 would loop indefinitely due to emulation overwriting the #UD with #DB and thus restarting the bad SYSCALL over and over.
Cc: Nadav Amit nadav.amit@gmail.com Cc: stable@vger.kernel.org Reported-by: Andy Lutomirski luto@kernel.org Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation") Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Radim Krčmář rkrcmar@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5545,12 +5545,13 @@ restart: unsigned long rflags = kvm_x86_ops->get_rflags(vcpu); toggle_interruptibility(vcpu, ctxt->interruptibility); vcpu->arch.emulate_regs_need_sync_to_vcpu = false; - kvm_rip_write(vcpu, ctxt->eip); - if (r == EMULATE_DONE && ctxt->tf) - kvm_vcpu_do_singlestep(vcpu, &r); if (!ctxt->have_exception || - exception_type(ctxt->exception.vector) == EXCPT_TRAP) + exception_type(ctxt->exception.vector) == EXCPT_TRAP) { + kvm_rip_write(vcpu, ctxt->eip); + if (r == EMULATE_DONE && ctxt->tf) + kvm_vcpu_do_singlestep(vcpu, &r); __kvm_set_rflags(vcpu, ctxt->eflags); + }
/* * For STI, interrupts are shadowed; so KVM_REQ_EVENT will
From: Bandan Das bsd@redhat.com
commit bae3a8d3308ee69a7dbdf145911b18dfda8ade0d upstream.
Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The bigsmp APIC implementation uses physical destination mode, but it nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with multiple bit being set.
This does not cause a functional problem because LDR and DFR are ignored when physical destination mode is active, but it triggered a problem on a 32-bit KVM guest which jumps into a kdump kernel.
The multiple bits set unearthed a bug in the KVM APIC implementation. The code which creates the logical destination map for VCPUs ignores the disabled state of the APIC and ends up overwriting an existing valid entry and as a result, APIC calibration hangs in the guest during kdump initialization.
Remove the bogus LDR/DFR initialization.
This is not intended to work around the KVM APIC bug. The LDR/DFR ininitalization is wrong on its own.
The issue goes back into the pre git history. The fixes tag is the commit in the bitkeeper import which introduced bigsmp support in 2003.
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems") Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Bandan Das bsd@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/apic/bigsmp_32.c | 24 ++---------------------- 1 file changed, 2 insertions(+), 22 deletions(-)
--- a/arch/x86/kernel/apic/bigsmp_32.c +++ b/arch/x86/kernel/apic/bigsmp_32.c @@ -37,32 +37,12 @@ static int bigsmp_early_logical_apicid(i return early_per_cpu(x86_cpu_to_apicid, cpu); }
-static inline unsigned long calculate_ldr(int cpu) -{ - unsigned long val, id; - - val = apic_read(APIC_LDR) & ~APIC_LDR_MASK; - id = per_cpu(x86_bios_cpu_apicid, cpu); - val |= SET_APIC_LOGICAL_ID(id); - - return val; -} - /* - * Set up the logical destination ID. - * - * Intel recommends to set DFR, LDR and TPR before enabling - * an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel - * document number 292116). So here it goes... + * bigsmp enables physical destination mode + * and doesn't use LDR and DFR */ static void bigsmp_init_apic_ldr(void) { - unsigned long val; - int cpu = smp_processor_id(); - - apic_write(APIC_DFR, APIC_DFR_FLAT); - val = calculate_ldr(cpu); - apic_write(APIC_LDR, val); }
static void bigsmp_setup_apic_routing(void)
From: Bandan Das bsd@redhat.com
commit 558682b5291937a70748d36fd9ba757fb25b99ae upstream.
Although APIC initialization will typically clear out the LDR before setting it, the APIC cleanup code should reset the LDR.
This was discovered with a 32-bit KVM guest jumping into a kdump kernel. The stale bits in the LDR triggered a bug in the KVM APIC implementation which caused the destination mapping for VCPUs to be corrupted.
Note that this isn't intended to paper over the KVM APIC bug. The kernel has to clear the LDR when resetting the APIC registers except when X2APIC is enabled.
This lacks a Fixes tag because missing to clear LDR goes way back into pre git history.
[ tglx: Made x2apic_enabled a function call as required ]
Signed-off-by: Bandan Das bsd@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/apic/apic.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1031,6 +1031,10 @@ void clear_local_APIC(void) apic_write(APIC_LVT0, v | APIC_LVT_MASKED); v = apic_read(APIC_LVT1); apic_write(APIC_LVT1, v | APIC_LVT_MASKED); + if (!x2apic_enabled()) { + v = apic_read(APIC_LDR) & ~APIC_LDR_MASK; + apic_write(APIC_LDR, v); + } if (maxlvt >= 4) { v = apic_read(APIC_LVTPC); apic_write(APIC_LVTPC, v | APIC_LVT_MASKED);
From: Henk van der Laan opensource@henkvdlaan.com
commit 08d676d1685c2a29e4d0e1b0242324e564d4589e upstream.
Revision 0x0117 suffers from an identical issue to earlier revisions, therefore it should be added to the quirks list.
Signed-off-by: Henk van der Laan opensource@henkvdlaan.com Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20190816200847.21366-1-opensource@henkvdlaan.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/storage/unusual_devs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/storage/unusual_devs.h +++ b/drivers/usb/storage/unusual_devs.h @@ -2006,7 +2006,7 @@ UNUSUAL_DEV( 0x14cd, 0x6600, 0x0201, 0x US_FL_IGNORE_RESIDUE ),
/* Reported by Michael Büsch m@bues.ch */ -UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116, +UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0117, "JMicron", "USB to ATA/ATAPI Bridge", USB_SC_DEVICE, USB_PR_DEVICE, NULL,
From: Oliver Neukum oneukum@suse.com
commit 1426bd2c9f7e3126e2678e7469dca9fd9fc6dd3e upstream.
In case of a disconnect an ongoing flush() has to be made fail. Nevertheless we cannot be sure that any pending URB has already finished, so although they will never succeed, they still must not be touched. The clean solution for this is to check for WDM_IN_USE and WDM_DISCONNECTED in flush(). There is no point in ever clearing WDM_IN_USE, as no further writes make sense.
The issue is as old as the driver.
Fixes: afba937e540c9 ("USB: CDC WDM driver") Reported-by: syzbot+d232cca6ec42c2edb3fc@syzkaller.appspotmail.com Signed-off-by: Oliver Neukum oneukum@suse.com Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20190827103436.21143-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/class/cdc-wdm.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -577,10 +577,20 @@ static int wdm_flush(struct file *file, { struct wdm_device *desc = file->private_data;
- wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags)); + wait_event(desc->wait, + /* + * needs both flags. We cannot do with one + * because resetting it would cause a race + * with write() yet we need to signal + * a disconnect + */ + !test_bit(WDM_IN_USE, &desc->flags) || + test_bit(WDM_DISCONNECTING, &desc->flags));
/* cannot dereference desc->intf if WDM_DISCONNECTING */ - if (desc->werr < 0 && !test_bit(WDM_DISCONNECTING, &desc->flags)) + if (test_bit(WDM_DISCONNECTING, &desc->flags)) + return -ENODEV; + if (desc->werr < 0) dev_err(&desc->intf->dev, "Error in flush path: %d\n", desc->werr);
@@ -968,8 +978,6 @@ static void wdm_disconnect(struct usb_in spin_lock_irqsave(&desc->iuspin, flags); set_bit(WDM_DISCONNECTING, &desc->flags); set_bit(WDM_READ, &desc->flags); - /* to terminate pending flushes */ - clear_bit(WDM_IN_USE, &desc->flags); spin_unlock_irqrestore(&desc->iuspin, flags); wake_up_all(&desc->wait); mutex_lock(&desc->rlock);
From: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com
commit a349b95d7ca0cea71be4a7dac29830703de7eb62 upstream.
This patch fixes an issue that the following error is possible to happen when ohci hardware causes an interruption and the system is shutting down at the same time.
[ 34.851754] usb 2-1: USB disconnect, device number 2 [ 35.166658] irq 156: nobody cared (try booting with the "irqpoll" option) [ 35.173445] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 5.3.0-rc5 #85 [ 35.179964] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) [ 35.187886] Workqueue: usb_hub_wq hub_event [ 35.192063] Call trace: [ 35.194509] dump_backtrace+0x0/0x150 [ 35.198165] show_stack+0x14/0x20 [ 35.201475] dump_stack+0xa0/0xc4 [ 35.204785] __report_bad_irq+0x34/0xe8 [ 35.208614] note_interrupt+0x2cc/0x318 [ 35.212446] handle_irq_event_percpu+0x5c/0x88 [ 35.216883] handle_irq_event+0x48/0x78 [ 35.220712] handle_fasteoi_irq+0xb4/0x188 [ 35.224802] generic_handle_irq+0x24/0x38 [ 35.228804] __handle_domain_irq+0x5c/0xb0 [ 35.232893] gic_handle_irq+0x58/0xa8 [ 35.236548] el1_irq+0xb8/0x180 [ 35.239681] __do_softirq+0x94/0x23c [ 35.243253] irq_exit+0xd0/0xd8 [ 35.246387] __handle_domain_irq+0x60/0xb0 [ 35.250475] gic_handle_irq+0x58/0xa8 [ 35.254130] el1_irq+0xb8/0x180 [ 35.257268] kernfs_find_ns+0x5c/0x120 [ 35.261010] kernfs_find_and_get_ns+0x3c/0x60 [ 35.265361] sysfs_unmerge_group+0x20/0x68 [ 35.269454] dpm_sysfs_remove+0x2c/0x68 [ 35.273284] device_del+0x80/0x370 [ 35.276683] hid_destroy_device+0x28/0x60 [ 35.280686] usbhid_disconnect+0x4c/0x80 [ 35.284602] usb_unbind_interface+0x6c/0x268 [ 35.288867] device_release_driver_internal+0xe4/0x1b0 [ 35.293998] device_release_driver+0x14/0x20 [ 35.298261] bus_remove_device+0x110/0x128 [ 35.302350] device_del+0x148/0x370 [ 35.305832] usb_disable_device+0x8c/0x1d0 [ 35.309921] usb_disconnect+0xc8/0x2d0 [ 35.313663] hub_event+0x6e0/0x1128 [ 35.317146] process_one_work+0x1e0/0x320 [ 35.321148] worker_thread+0x40/0x450 [ 35.324805] kthread+0x124/0x128 [ 35.328027] ret_from_fork+0x10/0x18 [ 35.331594] handlers: [ 35.333862] [<0000000079300c1d>] usb_hcd_irq [ 35.338126] [<0000000079300c1d>] usb_hcd_irq [ 35.342389] Disabling IRQ #156
ohci_shutdown() disables all the interrupt and rh_state is set to OHCI_RH_HALTED. In other hand, ohci_irq() is possible to enable OHCI_INTR_SF and OHCI_INTR_MIE on ohci_irq(). Note that OHCI_INTR_SF is possible to be set by start_ed_unlink() which is called: ohci_irq() -> process_done_list() -> takeback_td() -> start_ed_unlink()
So, ohci_irq() has the following condition, the issue happens by &ohci->regs->intrenable = OHCI_INTR_MIE | OHCI_INTR_SF and ohci->rh_state = OHCI_RH_HALTED:
/* interrupt for some other device? */ if (ints == 0 || unlikely(ohci->rh_state == OHCI_RH_HALTED)) return IRQ_NOTMINE;
To fix the issue, ohci_shutdown() holds the spin lock while disabling the interruption and changing the rh_state flag to prevent reenable the OHCI_INTR_MIE unexpectedly. Note that io_watchdog_func() also calls the ohci_shutdown() and it already held the spin lock, so that the patch makes a new function as _ohci_shutdown().
This patch is inspired by a Renesas R-Car Gen3 BSP patch from Tho Vu.
Signed-off-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Cc: stable stable@vger.kernel.org Acked-by: Alan Stern stern@rowland.harvard.edu Link: https://lore.kernel.org/r/1566877910-6020-1-git-send-email-yoshihiro.shimoda... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/host/ohci-hcd.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
--- a/drivers/usb/host/ohci-hcd.c +++ b/drivers/usb/host/ohci-hcd.c @@ -415,8 +415,7 @@ static void ohci_usb_reset (struct ohci_ * other cases where the next software may expect clean state from the * "firmware". this is bus-neutral, unlike shutdown() methods. */ -static void -ohci_shutdown (struct usb_hcd *hcd) +static void _ohci_shutdown(struct usb_hcd *hcd) { struct ohci_hcd *ohci;
@@ -432,6 +431,16 @@ ohci_shutdown (struct usb_hcd *hcd) ohci->rh_state = OHCI_RH_HALTED; }
+static void ohci_shutdown(struct usb_hcd *hcd) +{ + struct ohci_hcd *ohci = hcd_to_ohci(hcd); + unsigned long flags; + + spin_lock_irqsave(&ohci->lock, flags); + _ohci_shutdown(hcd); + spin_unlock_irqrestore(&ohci->lock, flags); +} + /*-------------------------------------------------------------------------* * HC functions *-------------------------------------------------------------------------*/ @@ -750,7 +759,7 @@ static void io_watchdog_func(unsigned lo died: usb_hc_died(ohci_to_hcd(ohci)); ohci_dump(ohci); - ohci_shutdown(ohci_to_hcd(ohci)); + _ohci_shutdown(ohci_to_hcd(ohci)); goto done; } else { /* No write back because the done queue was empty */
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit f6445b6b2f2bb1745080af4a0926049e8bca2617 upstream.
The option named "auto_delink_en" is a bit misleading, as setting it to false doesn't really disable auto-delink but let auto-delink be firmware controlled.
Update the description to reflect the real usage of this parameter.
Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20190827173450.13572-1-kai.heng.feng@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/storage/realtek_cr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/storage/realtek_cr.c +++ b/drivers/usb/storage/realtek_cr.c @@ -50,7 +50,7 @@ MODULE_VERSION("1.03");
static int auto_delink_en = 1; module_param(auto_delink_en, int, S_IRUGO | S_IWUSR); -MODULE_PARM_DESC(auto_delink_en, "enable auto delink"); +MODULE_PARM_DESC(auto_delink_en, "auto delink mode (0=firmware, 1=software [default])");
#ifdef CONFIG_REALTEK_AUTOPM static int ss_en = 1;
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit 1902a01e2bcc3abd7c9a18dc05e78c7ab4a53c54 upstream.
Auto-delink requires writing special registers to ums-realtek devices. Unconditionally enable auto-delink may break newer devices.
So only enable auto-delink by default for the original three IDs, 0x0138, 0x0158 and 0x0159.
Realtek is working on a patch to properly support auto-delink for other IDs.
BugLink: https://bugs.launchpad.net/bugs/1838886 Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Acked-by: Alan Stern stern@rowland.harvard.edu Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20190827173450.13572-2-kai.heng.feng@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/storage/realtek_cr.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/usb/storage/realtek_cr.c +++ b/drivers/usb/storage/realtek_cr.c @@ -1006,12 +1006,15 @@ static int init_realtek_cr(struct us_dat goto INIT_FAIL; }
- if (CHECK_FW_VER(chip, 0x5888) || CHECK_FW_VER(chip, 0x5889) || - CHECK_FW_VER(chip, 0x5901)) - SET_AUTO_DELINK(chip); - if (STATUS_LEN(chip) == 16) { - if (SUPPORT_AUTO_DELINK(chip)) + if (CHECK_PID(chip, 0x0138) || CHECK_PID(chip, 0x0158) || + CHECK_PID(chip, 0x0159)) { + if (CHECK_FW_VER(chip, 0x5888) || CHECK_FW_VER(chip, 0x5889) || + CHECK_FW_VER(chip, 0x5901)) SET_AUTO_DELINK(chip); + if (STATUS_LEN(chip) == 16) { + if (SUPPORT_AUTO_DELINK(chip)) + SET_AUTO_DELINK(chip); + } } #ifdef CONFIG_REALTEK_AUTOPM if (ss_en)
[ Upstream commit e27c310af5c05cf876d9cad006928076c27f54d4 ]
In its current form, user_64bit_mode() can only be used when CONFIG_X86_64 is selected. This implies that code built with CONFIG_X86_64=n cannot use it. If a piece of code needs to be built for both CONFIG_X86_64=y and CONFIG_X86_64=n and wants to use this function, it needs to wrap it in an #ifdef/#endif; potentially, in multiple places.
This can be easily avoided with a single #ifdef/#endif pair within user_64bit_mode() itself.
Suggested-by: Borislav Petkov bp@suse.de Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Cc: "Michael S. Tsirkin" mst@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: ricardo.neri@intel.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Paul Gortmaker paul.gortmaker@windriver.com Cc: Huang Rui ray.huang@amd.com Cc: Qiaowei Ren qiaowei.ren@intel.com Cc: Shuah Khan shuah@kernel.org Cc: Kees Cook keescook@chromium.org Cc: Jonathan Corbet corbet@lwn.net Cc: Jiri Slaby jslaby@suse.cz Cc: Dmitry Vyukov dvyukov@google.com Cc: "Ravi V. Shankar" ravi.v.shankar@intel.com Cc: Chris Metcalf cmetcalf@mellanox.com Cc: Brian Gerst brgerst@gmail.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Colin Ian King colin.king@canonical.com Cc: Chen Yucong slaoub@gmail.com Cc: Adam Buchbinder adam.buchbinder@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Thomas Garnier thgarnie@google.com Link: https://lkml.kernel.org/r/1509135945-13762-4-git-send-email-ricardo.neri-cal... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/ptrace.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 6271281f947d8..0d8e0831b1a07 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -121,9 +121,9 @@ static inline int v8086_mode(struct pt_regs *regs) #endif }
-#ifdef CONFIG_X86_64 static inline bool user_64bit_mode(struct pt_regs *regs) { +#ifdef CONFIG_X86_64 #ifndef CONFIG_PARAVIRT /* * On non-paravirt systems, this is the only long mode CPL 3 @@ -134,8 +134,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs) /* Headers are too twisted for this to go in paravirt.h. */ return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; #endif +#else /* !CONFIG_X86_64 */ + return false; +#endif }
+#ifdef CONFIG_X86_64 #define current_user_stack_pointer() current_pt_regs()->sp #define compat_user_stack_pointer() current_pt_regs()->sp #endif
[ Upstream commit 9212ec7d8357ea630031e89d0d399c761421c83b ]
32-bit processes running on a 64-bit kernel are not always detected correctly, causing the process to crash when uretprobes are installed.
The reason for the crash is that in_ia32_syscall() is used to determine the process's mode, which only works correctly when called from a syscall.
In the case of uretprobes, however, the function is called from a exception and always returns 'false' on a 64-bit kernel. In consequence this leads to corruption of the process's return address.
Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which is correct in any situation.
[ tglx: Add a comment and the following historical info ]
This should have been detected by the rename which happened in commit
abfb9498ee13 ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()")
which states in the changelog:
The is_ia32_task()/is_x32_task() function names are a big misnomer: they suggests that the compat-ness of a system call is a task property, which is not true, the compatness of a system call purely depends on how it was invoked through the system call layer. .....
and then it went and blindly renamed every call site.
Sadly enough this was already mentioned here:
8faaed1b9f50 ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()")
where the changelog says:
TODO: is_ia32_task() is not what we actually want, TS_COMPAT does not necessarily mean 32bit. Fortunately syscall-like insns can't be probed so it actually works, but it would be better to rename and use is_ia32_frame().
and goes all the way back to:
0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions")
Oh well. 7+ years until someone actually tried a uretprobe on a 32bit process on a 64bit kernel....
Fixes: 0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions") Signed-off-by: Sebastian Mayr me@sam.st Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Dmitry Safonov dsafonov@virtuozzo.com Cc: Oleg Nesterov oleg@redhat.com Cc: Srikar Dronamraju srikar@linux.vnet.ibm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.st Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/uprobes.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index b8105289c60b6..178d63cac321b 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -514,9 +514,12 @@ struct uprobe_xol_ops { void (*abort)(struct arch_uprobe *, struct pt_regs *); };
-static inline int sizeof_long(void) +static inline int sizeof_long(struct pt_regs *regs) { - return is_ia32_task() ? 4 : 8; + /* + * Check registers for mode as in_xxx_syscall() does not apply here. + */ + return user_64bit_mode(regs) ? 8 : 4; }
static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs) @@ -527,9 +530,9 @@ static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
static int push_ret_address(struct pt_regs *regs, unsigned long ip) { - unsigned long new_sp = regs->sp - sizeof_long(); + unsigned long new_sp = regs->sp - sizeof_long(regs);
- if (copy_to_user((void __user *)new_sp, &ip, sizeof_long())) + if (copy_to_user((void __user *)new_sp, &ip, sizeof_long(regs))) return -EFAULT;
regs->sp = new_sp; @@ -562,7 +565,7 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs long correction = utask->vaddr - utask->xol_vaddr; regs->ip += correction; } else if (auprobe->defparam.fixups & UPROBE_FIX_CALL) { - regs->sp += sizeof_long(); /* Pop incorrect return address */ + regs->sp += sizeof_long(regs); /* Pop incorrect return address */ if (push_ret_address(regs, utask->vaddr + auprobe->defparam.ilen)) return -ERESTART; } @@ -671,7 +674,7 @@ static int branch_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs) * "call" insn was executed out-of-line. Just restore ->sp and restart. * We could also restore ->ip and try to call branch_emulate_op() again. */ - regs->sp += sizeof_long(); + regs->sp += sizeof_long(regs); return -ERESTART; }
@@ -962,7 +965,7 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs) unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs) { - int rasize = sizeof_long(), nleft; + int rasize = sizeof_long(regs), nleft; unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */
if (copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, rasize))
From: Eugen Hristev eugen.hristev@microchip.com
commit 7871aa60ae0086fe4626abdf5ed13eeddf306c61 upstream.
HS200 is not implemented in the driver, but the controller claims it through caps. Remove it via a quirk, to make sure the mmc core do not try to enable HS200, as it causes the eMMC initialization to fail.
Signed-off-by: Eugen Hristev eugen.hristev@microchip.com Acked-by: Ludovic Desroches ludovic.desroches@microchip.com Acked-by: Adrian Hunter adrian.hunter@intel.com Fixes: bb5f8ea4d514 ("mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-of-at91.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/mmc/host/sdhci-of-at91.c +++ b/drivers/mmc/host/sdhci-of-at91.c @@ -144,6 +144,9 @@ static int sdhci_at91_probe(struct platf
sdhci_get_of_property(pdev);
+ /* HS200 is broken at this moment */ + host->quirks2 = SDHCI_QUIRK2_BROKEN_HS200; + ret = sdhci_add_host(host); if (ret) goto clocks_disable_unprepare;
From: Ulf Hansson ulf.hansson@linaro.org
commit 72741084d903e65e121c27bd29494d941729d4a1 upstream.
The OCR register defines the supported range of VDD voltages for SD cards. However, it has turned out that some SD cards reports an invalid voltage range, for example having bit7 set.
When a host supports MMC_CAP2_FULL_PWR_CYCLE and some of the voltages from the invalid VDD range, this triggers the core to run a power cycle of the card to try to initialize it at the lowest common supported voltage. Obviously this fails, since the card can't support it.
Let's fix this problem, by clearing invalid bits from the read OCR register for SD cards, before proceeding with the VDD voltage negotiation.
Cc: stable@vger.kernel.org Reported-by: Philip Langdale philipl@overt.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Reviewed-by: Philip Langdale philipl@overt.org Tested-by: Philip Langdale philipl@overt.org Tested-by: Manuel Presnitz mail@mpy.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/core/sd.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/mmc/core/sd.c +++ b/drivers/mmc/core/sd.c @@ -1232,6 +1232,12 @@ int mmc_attach_sd(struct mmc_host *host) goto err; }
+ /* + * Some SD cards claims an out of spec VDD voltage range. Let's treat + * these bits as being in-valid and especially also bit7. + */ + ocr &= ~0x7FFF; + rocr = mmc_select_voltage(host, ocr);
/*
From: Ding Xiang dingxiang@cmss.chinamobile.com
commit 961b6ffe0e2c403b09a8efe4a2e986b3c415391a upstream.
In the error path of stm_source_register_device(), the kfree is unnecessary, as the put_device() before it ends up calling stm_source_device_release() to free stm_source_device, leading to a double free at the outer kfree() call. Remove it.
Signed-off-by: Ding Xiang dingxiang@cmss.chinamobile.com Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Fixes: 7bd1d4093c2fa ("stm class: Introduce an abstraction for System Trace Module devices") Link: https://lore.kernel.org/linux-arm-kernel/1563354988-23826-1-git-send-email-d... Cc: stable@vger.kernel.org # v4.4+ Link: https://lore.kernel.org/r/20190821074955.3925-2-alexander.shishkin@linux.int... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwtracing/stm/core.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/hwtracing/stm/core.c +++ b/drivers/hwtracing/stm/core.c @@ -1020,7 +1020,6 @@ int stm_source_register_device(struct de
err: put_device(&src->dev); - kfree(src);
return err; }
From: Nadav Amit namit@vmware.com
commit ba03a9bbd17b149c373c0ea44017f35fc2cd0f28 upstream.
Francois reported that VMware balloon gets stuck after a balloon reset, when the VMCI doorbell is removed. A similar error can occur when the balloon driver is removed with the following splat:
[ 1088.622000] INFO: task modprobe:3565 blocked for more than 120 seconds. [ 1088.622035] Tainted: G W 5.2.0 #4 [ 1088.622087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1088.622205] modprobe D 0 3565 1450 0x00000000 [ 1088.622210] Call Trace: [ 1088.622246] __schedule+0x2a8/0x690 [ 1088.622248] schedule+0x2d/0x90 [ 1088.622250] schedule_timeout+0x1d3/0x2f0 [ 1088.622252] wait_for_completion+0xba/0x140 [ 1088.622320] ? wake_up_q+0x80/0x80 [ 1088.622370] vmci_resource_remove+0xb9/0xc0 [vmw_vmci] [ 1088.622373] vmci_doorbell_destroy+0x9e/0xd0 [vmw_vmci] [ 1088.622379] vmballoon_vmci_cleanup+0x6e/0xf0 [vmw_balloon] [ 1088.622381] vmballoon_exit+0x18/0xcc8 [vmw_balloon] [ 1088.622394] __x64_sys_delete_module+0x146/0x280 [ 1088.622408] do_syscall_64+0x5a/0x130 [ 1088.622410] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1088.622415] RIP: 0033:0x7f54f62791b7 [ 1088.622421] Code: Bad RIP value. [ 1088.622421] RSP: 002b:00007fff2a949008 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1088.622426] RAX: ffffffffffffffda RBX: 000055dff8b55d00 RCX: 00007f54f62791b7 [ 1088.622426] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055dff8b55d68 [ 1088.622427] RBP: 000055dff8b55d00 R08: 00007fff2a947fb1 R09: 0000000000000000 [ 1088.622427] R10: 00007f54f62f5cc0 R11: 0000000000000206 R12: 000055dff8b55d68 [ 1088.622428] R13: 0000000000000001 R14: 000055dff8b55d68 R15: 00007fff2a94a3f0
The cause for the bug is that when the "delayed" doorbell is invoked, it takes a reference on the doorbell entry and schedules work that is supposed to run the appropriate code and drop the doorbell entry reference. The code ignores the fact that if the work is already queued, it will not be scheduled to run one more time. As a result one of the references would not be dropped. When the code waits for the reference to get to zero, during balloon reset or module removal, it gets stuck.
Fix it. Drop the reference if schedule_work() indicates that the work is already queued.
Note that this bug got more apparent (or apparent at all) due to commit ce664331b248 ("vmw_balloon: VMCI_DOORBELL_SET does not check status").
Fixes: 83e2ec765be03 ("VMCI: doorbell implementation.") Reported-by: Francois Rigault rigault.francois@gmail.com Cc: Jorgen Hansen jhansen@vmware.com Cc: Adit Ranadive aditr@vmware.com Cc: Alexios Zavras alexios.zavras@intel.com Cc: Vishnu DASA vdasa@vmware.com Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit namit@vmware.com Reviewed-by: Vishnu Dasa vdasa@vmware.com Link: https://lore.kernel.org/r/20190820202638.49003-1-namit@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/misc/vmw_vmci/vmci_doorbell.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/misc/vmw_vmci/vmci_doorbell.c +++ b/drivers/misc/vmw_vmci/vmci_doorbell.c @@ -318,7 +318,8 @@ int vmci_dbell_host_context_notify(u32 s
entry = container_of(resource, struct dbell_entry, resource); if (entry->run_delayed) { - schedule_work(&entry->work); + if (!schedule_work(&entry->work)) + vmci_resource_put(resource); } else { entry->notify_cb(entry->client_data); vmci_resource_put(resource); @@ -366,7 +367,8 @@ static void dbell_fire_entries(u32 notif atomic_read(&dbell->active) == 1) { if (dbell->run_delayed) { vmci_resource_get(&dbell->resource); - schedule_work(&dbell->work); + if (!schedule_work(&dbell->work)) + vmci_resource_put(&dbell->resource); } else { dbell->notify_cb(dbell->client_data); }
From: Hodaszi, Robert Robert.Hodaszi@digi.com
commit 0d31d4dbf38412f5b8b11b4511d07b840eebe8cb upstream.
This reverts commit 96cce12ff6e0 ("cfg80211: fix processing world regdomain when non modular").
Re-triggering a reg_process_hint with the last request on all events, can make the regulatory domain fail in case of multiple WiFi modules. On slower boards (espacially with mdev), enumeration of the WiFi modules can end up in an intersected regulatory domain, and user cannot set it with 'iw reg set' anymore.
This is happening, because: - 1st module enumerates, queues up a regulatory request - request gets processed by __reg_process_hint_driver(): - checks if previous was set by CORE -> yes - checks if regulator domain changed -> yes, from '00' to e.g. 'US' -> sends request to the 'crda' - 2nd module enumerates, queues up a regulator request (which triggers the reg_todo() work) - reg_todo() -> reg_process_pending_hints() sees, that the last request is not processed yet, so it tries to process it again. __reg_process_hint driver() will run again, and: - checks if the last request's initiator was the core -> no, it was the driver (1st WiFi module) - checks, if the previous initiator was the driver -> yes - checks if the regulator domain changed -> yes, it was '00' (set by core, and crda call did not return yet), and should be changed to 'US'
------> __reg_process_hint_driver calls an intersect
Besides, the reg_process_hint call with the last request is meaningless since the crda call has a timeout work. If that timeout expires, the first module's request will lost.
Cc: stable@vger.kernel.org Fixes: 96cce12ff6e0 ("cfg80211: fix processing world regdomain when non modular") Signed-off-by: Robert Hodaszi robert.hodaszi@digi.com Link: https://lore.kernel.org/r/20190614131600.GA13897@a1-hr Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/wireless/reg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/wireless/reg.c +++ b/net/wireless/reg.c @@ -2234,7 +2234,7 @@ static void reg_process_pending_hints(vo
/* When last_request->processed becomes true this will be rescheduled */ if (lr && !lr->processed) { - reg_process_hint(lr); + pr_debug("Pending regulatory request, waiting for it to be processed...\n"); return; }
From: Johannes Berg johannes.berg@intel.com
commit 5fd2f91ad483baffdbe798f8a08f1b41442d1e24 upstream.
If TDLS station addition is rejected, the sta memory is leaked. Avoid this by moving the check before the allocation.
Cc: stable@vger.kernel.org Fixes: 7ed5285396c2 ("mac80211: don't initiate TDLS connection if station is not associated to AP") Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/mac80211/cfg.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1211,6 +1211,11 @@ static int ieee80211_add_station(struct if (is_multicast_ether_addr(mac)) return -EINVAL;
+ if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER) && + sdata->vif.type == NL80211_IFTYPE_STATION && + !sdata->u.mgd.associated) + return -EINVAL; + sta = sta_info_alloc(sdata, mac, GFP_KERNEL); if (!sta) return -ENOMEM; @@ -1228,10 +1233,6 @@ static int ieee80211_add_station(struct if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER)) sta->sta.tdls = true;
- if (sta->sta.tdls && sdata->vif.type == NL80211_IFTYPE_STATION && - !sdata->u.mgd.associated) - return -EINVAL; - err = sta_apply_parameters(local, sta, params); if (err) { sta_info_free(local, sta);
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
I incorrectly merged commit 31a2fbb390fe ("x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()") when backporting it, as was graciously pointed out at https://grsecurity.net/teardown_of_a_failed_linux_lts_spectre_fix.php
Resolve the upstream difference with the stable kernel merge to properly protect things.
Reported-by: Brad Spengler spender@grsecurity.net Cc: Dianzhang Chen dianzhangchen0@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: bp@alien8.de Cc: hpa@zytor.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/ptrace.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -698,11 +698,10 @@ static unsigned long ptrace_get_debugreg { struct thread_struct *thread = &tsk->thread; unsigned long val = 0; - int index = n;
if (n < HBP_NUM) { + int index = array_index_nospec(n, HBP_NUM); struct perf_event *bp = thread->ptrace_bps[index]; - index = array_index_nospec(index, HBP_NUM);
if (bp) val = bp->hw.info.address;
stable-rc/linux-4.4.y boot: 118 boots: 6 failed, 102 passed with 9 offline, 1 conflict (v4.4.190-78-gfab7823b08aa)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.1... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.190-78-g...
Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.190-78-gfab7823b08aa Git Commit: fab7823b08aae873a7ab1918c9a0a5125dc89754 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 49 unique boards, 18 SoC families, 14 builds out of 190
Boot Failures Detected:
arm64: defconfig: gcc-8: qcom-qdf2400: 1 failed lab
arm: vexpress_defconfig: gcc-8: qemu_arm-virt-gicv3: 5 failed labs
Offline Platforms:
arm64:
defconfig: gcc-8 apq8016-sbc: 1 offline lab
arm:
multi_v7_defconfig: gcc-8 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab sun5i-r8-chip: 1 offline lab
davinci_all_defconfig: gcc-8 dm365evm,legacy: 1 offline lab
qcom_defconfig: gcc-8 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab
sunxi_defconfig: gcc-8 sun5i-r8-chip: 1 offline lab sun7i-a20-bananapi: 1 offline lab
Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)
x86_64: x86_64_defconfig: qemu_x86_64: lab-baylibre: FAIL (gcc-8) lab-mhart: PASS (gcc-8) lab-linaro-lkft: PASS (gcc-8) lab-drue: PASS (gcc-8) lab-collabora: PASS (gcc-8)
--- For more info write to info@kernelci.org
On 9/4/19 11:52 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.191 release. There are 77 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 06 Sep 2019 05:50:23 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.191-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Wed, Sep 04, 2019 at 07:52:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.191 release. There are 77 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 06 Sep 2019 05:50:23 PM UTC. Anything received after that time might be too late.
Build results: total: 170 pass: 170 fail: 0 Qemu test results: total: 324 pass: 324 fail: 0
Guenter
Hello!
On 9/4/19 12:52 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.191 release. There are 77 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 06 Sep 2019 05:50:23 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.191-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.4.191-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: fab7823b08aae873a7ab1918c9a0a5125dc89754 git describe: v4.4.190-78-gfab7823b08aa Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.190-78-...
No regressions (compared to build v4.4.190)
No fixes (compared to build v4.4.190)
Ran 20009 total tests in the following environments and test suites.
Environments -------------- - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * network-basic-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * kvm-unit-tests * install-android-platform-tools-r2600 * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * prep-tmp-disk * ssuite
Summary ------------------------------------------------------------------------
kernel: 4.4.191-rc1 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git branch: 4.4.191-rc1-hikey-20190904-546 git commit: 0f182005f060ccc4fa2d14084c4a207aaa2207e4 git describe: 4.4.191-rc1-hikey-20190904-546 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.1...
No regressions (compared to build 4.4.191-rc1-hikey-20190827-544)
No fixes (compared to build 4.4.191-rc1-hikey-20190827-544)
Ran 1533 total tests in the following environments and test suites.
Environments -------------- - hi6220-hikey - arm64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance
Greetings!
Daniel Díaz daniel.diaz@linaro.org
On Wed, Sep 04, 2019 at 07:52:47PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.191 release. There are 77 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 06 Sep 2019 05:50:23 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.191-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled, booted, and no regressions on my system.
-Kelsey
On 04/09/2019 18:52, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.191 release. There are 77 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri 06 Sep 2019 05:50:23 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.191-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.4: 6 builds: 6 pass, 0 fail 12 boots: 12 pass, 0 fail 19 tests: 19 pass, 0 fail
Linux version: 4.4.191-rc1-gfab7823b08aa Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra30-cardhu-a04
Cheers Jon
linux-stable-mirror@lists.linaro.org