This is the start of the stable review cycle for the 4.14.141 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 29 Aug 2019 07:25:02 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.141-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.141-rc1
Alastair D'Silva alastair@d-silva.org powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB
Dan Carpenter dan.carpenter@oracle.com dm zoned: fix potential NULL dereference in dmz_do_reclaim()
Darrick J. Wong darrick.wong@oracle.com xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
Henry Burns henryburns@google.com mm/zsmalloc.c: fix race condition in zs_destroy_pool
Henry Burns henryburns@google.com mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
Vlastimil Babka vbabka@suse.cz mm, page_owner: handle THP splits correctly
Michael Kelley mikelley@microsoft.com genirq: Properly pair kobject_del() with kobject_add()
Dmitry Fomichev dmitry.fomichev@wdc.com dm zoned: properly handle backing device failure
Dmitry Fomichev dmitry.fomichev@wdc.com dm zoned: improve error handling in i/o map code
Dmitry Fomichev dmitry.fomichev@wdc.com dm zoned: improve error handling in reclaim
Mikulas Patocka mpatocka@redhat.com dm table: fix invalid memory accesses with too high sector number
ZhangXiaoxu zhangxiaoxu5@huawei.com dm space map metadata: fix missing store of apply_bops() return value
ZhangXiaoxu zhangxiaoxu5@huawei.com dm btree: fix order of block initialization in btree_split_beneath
Dmitry Fomichev dmitry.fomichev@wdc.com dm kcopyd: always complete failed jobs
John Hubbard jhubbard@nvidia.com x86/boot: Fix boot regression caused by bootparam sanitizing
John Hubbard jhubbard@nvidia.com x86/boot: Save fields explicitly, zero out everything else
Tom Lendacky thomas.lendacky@amd.com x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
Thomas Gleixner tglx@linutronix.de x86/apic: Handle missing global clockevent gracefully
Sean Christopherson sean.j.christopherson@intel.com x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
Oleg Nesterov oleg@redhat.com userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
Bartosz Golaszewski bgolaszewski@baylibre.com gpiolib: never report open-drain/source lines as 'input' to user-space
Lyude Paul lyude@redhat.com drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX
Ilya Dryomov idryomov@gmail.com libceph: fix PG split vs OSD (re)connect race
Jeff Layton jlayton@kernel.org ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
Mikulas Patocka mpatocka@redhat.com Revert "dm bufio: fix deadlock with loop device"
Jason Gerecke jason.gerecke@wacom.com HID: wacom: Correct distance scale for 2nd-gen Intuos devices
Aaron Armstrong Skomra skomra@gmail.com HID: wacom: correct misreported EKR ring values
Naresh Kamboju <naresh.kamboju () linaro ! org> selftests: kvm: Adding config fragments
Jin Yao yao.jin@linux.intel.com perf pmu-events: Fix missing "cpu_clk_unhalted.core" event
He Zhe zhe.he@windriver.com perf cpumap: Fix writing to illegal memory in handling cpumap mask
He Zhe zhe.he@windriver.com perf ftrace: Fix failure to set cpumask when only one cpu is present
Colin Ian King colin.king@canonical.com drm/vmwgfx: fix memory leak when too many retries have occurred
Valdis Kletnieks valdis.kletnieks@vt.edu x86/lib/cpu: Address missing prototypes warning
Jens Axboe axboe@kernel.dk libata: add SG safety checks in SFF pio transfers
Jens Axboe axboe@kernel.dk libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
Jiangfeng Xiao xiaojiangfeng@huawei.com net: hisilicon: Fix dma_map_single failed on arm64
Jiangfeng Xiao xiaojiangfeng@huawei.com net: hisilicon: fix hip04-xmit never return TX_BUSY
Jiangfeng Xiao xiaojiangfeng@huawei.com net: hisilicon: make hip04_tx_reclaim non-reentrant
Christophe JAILLET christophe.jaillet@wanadoo.fr net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
Sebastien Tisserant stisserant@wallix.com SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
Nicolas Saenz Julienne nsaenzjulienne@suse.de HID: input: fix a4tech horizontal wheel custom usage
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
Wang Xiayang xywang.sjtu@sjtu.edu.cn net/ethernet/qlogic/qed: force the string buffer NULL-terminated
Wang Xiayang xywang.sjtu@sjtu.edu.cn can: peak_usb: force the string buffer NULL-terminated
Wang Xiayang xywang.sjtu@sjtu.edu.cn can: sja1000: force the string buffer NULL-terminated
Jiri Olsa jolsa@kernel.org perf bench numa: Fix cpu0 binding
Juliana Rodrigueiro juliana.rodrigueiro@intra2net.com isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: Fix rename concurrency with listing
Jia-Ju Bai baijiaju1990@gmail.com isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain()
Michal Kalderon michal.kalderon@marvell.com qed: RDMA - Fix the hw_ver returned in device attributes
Bob Ham bob.ham@puri.sm net: usb: qmi_wwan: Add the BroadMobi BM818 card
Peter Ujfalusi peter.ujfalusi@ti.com ASoC: ti: davinci-mcasp: Correct slot_width posed constraint
Navid Emamdoost navid.emamdoost@gmail.com st_nci_hci_connectivity_event_received: null check the allocation
Navid Emamdoost navid.emamdoost@gmail.com st21nfca_connectivity_event_received: null check the allocation
Ricard Wanderlof ricard.wanderlof@axis.com ASoC: Fail card instantiation if DAI format setup fails
Rasmus Villemoes rasmus.villemoes@prevas.dk can: dev: call netif_carrier_off() in register_candev()
Thomas Falcon tlfalcon@linux.ibm.com bonding: Force slave speed check after link state recovery for 802.3ad
Charles Keepax ckeepax@opensource.cirrus.com ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks
Wenwen Wang wenwen@cs.uga.edu netfilter: ebtables: fix a memory leak bug in compat
Vladimir Kondratiev vladimir.kondratiev@linux.intel.com mips: fix cacheinfo
Thomas Bogendoerfer tbogendoerfer@suse.de MIPS: kernel: only use i8253 clocksource with periodic clockevent
Ilya Trukhanov lahvuun@gmail.com HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 7 ++ Makefile | 4 +- arch/mips/kernel/cacheinfo.c | 2 + arch/mips/kernel/i8253.c | 3 +- arch/powerpc/kernel/misc_64.S | 4 +- arch/x86/include/asm/bootparam_utils.h | 61 +++++++++++---- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/nospec-branch.h | 2 +- arch/x86/kernel/apic/apic.c | 68 +++++++++++++---- arch/x86/kernel/cpu/amd.c | 66 +++++++++++++++++ arch/x86/lib/cpu.c | 1 + arch/x86/power/cpu.c | 86 ++++++++++++++++++---- drivers/ata/libata-scsi.c | 21 ++++++ drivers/ata/libata-sff.c | 6 ++ drivers/gpio/gpiolib.c | 6 +- drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 24 ++++-- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 4 +- drivers/hid/hid-a4tech.c | 30 +++++++- drivers/hid/hid-tmff.c | 12 +++ drivers/hid/wacom_wac.c | 4 +- drivers/isdn/hardware/mISDN/hfcsusb.c | 13 +++- drivers/md/dm-bufio.c | 4 +- drivers/md/dm-kcopyd.c | 5 +- drivers/md/dm-table.c | 5 +- drivers/md/dm-zoned-metadata.c | 59 +++++++++++---- drivers/md/dm-zoned-reclaim.c | 44 ++++++++--- drivers/md/dm-zoned-target.c | 67 +++++++++++++++-- drivers/md/dm-zoned.h | 10 +++ drivers/md/persistent-data/dm-btree.c | 31 ++++---- drivers/md/persistent-data/dm-space-map-metadata.c | 2 +- drivers/net/bonding/bond_main.c | 9 +++ drivers/net/can/dev.c | 2 + drivers/net/can/sja1000/peak_pcmcia.c | 2 +- drivers/net/can/usb/peak_usb/pcan_usb_core.c | 2 +- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 5 +- drivers/net/ethernet/hisilicon/hip04_eth.c | 28 ++++--- drivers/net/ethernet/qlogic/qed/qed_int.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_rdma.c | 2 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/nfc/st-nci/se.c | 2 + drivers/nfc/st21nfca/se.c | 2 + fs/ceph/locks.c | 3 +- fs/cifs/smb2ops.c | 10 ++- fs/nfs/nfs4_fs.h | 3 +- fs/nfs/nfs4client.c | 5 +- fs/nfs/nfs4state.c | 27 +++++-- fs/userfaultfd.c | 25 ++++--- fs/xfs/xfs_iops.c | 1 + kernel/irq/irqdesc.c | 15 +++- mm/huge_memory.c | 4 + mm/zsmalloc.c | 78 ++++++++++++++++++-- net/bridge/netfilter/ebtables.c | 4 +- net/ceph/osd_client.c | 9 +-- net/netfilter/ipset/ip_set_core.c | 2 +- sound/soc/davinci/davinci-mcasp.c | 43 ++++++++--- sound/soc/soc-core.c | 7 +- sound/soc/soc-dapm.c | 8 +- tools/perf/bench/numa.c | 6 +- tools/perf/builtin-ftrace.c | 2 +- tools/perf/pmu-events/jevents.c | 1 + tools/perf/util/cpumap.c | 5 +- tools/testing/selftests/kvm/config | 3 + 62 files changed, 786 insertions(+), 184 deletions(-)
[ Upstream commit 65f11c72780fa9d598df88def045ccb6a885cf80 ]
Enable force feedback for the Thrustmaster Dual Trigger 2 in 1 Rumble Force gamepad. Compared to other Thrustmaster devices, left and right rumble motors here are swapped.
Signed-off-by: Ilya Trukhanov lahvuun@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-tmff.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/hid/hid-tmff.c b/drivers/hid/hid-tmff.c index b83376077d722..cfa0cb22c9b3c 100644 --- a/drivers/hid/hid-tmff.c +++ b/drivers/hid/hid-tmff.c @@ -34,6 +34,8 @@
#include "hid-ids.h"
+#define THRUSTMASTER_DEVICE_ID_2_IN_1_DT 0xb320 + static const signed short ff_rumble[] = { FF_RUMBLE, -1 @@ -88,6 +90,7 @@ static int tmff_play(struct input_dev *dev, void *data, struct hid_field *ff_field = tmff->ff_field; int x, y; int left, right; /* Rumbling */ + int motor_swap;
switch (effect->type) { case FF_CONSTANT: @@ -112,6 +115,13 @@ static int tmff_play(struct input_dev *dev, void *data, ff_field->logical_minimum, ff_field->logical_maximum);
+ /* 2-in-1 strong motor is left */ + if (hid->product == THRUSTMASTER_DEVICE_ID_2_IN_1_DT) { + motor_swap = left; + left = right; + right = motor_swap; + } + dbg_hid("(left,right)=(%08x, %08x)\n", left, right); ff_field->value[0] = left; ff_field->value[1] = right; @@ -238,6 +248,8 @@ static const struct hid_device_id tm_devices[] = { .driver_data = (unsigned long)ff_rumble }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb304), /* FireStorm Dual Power 2 (and 3) */ .driver_data = (unsigned long)ff_rumble }, + { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, THRUSTMASTER_DEVICE_ID_2_IN_1_DT), /* Dual Trigger 2-in-1 */ + .driver_data = (unsigned long)ff_rumble }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb323), /* Dual Trigger 3-in-1 (PC Mode) */ .driver_data = (unsigned long)ff_rumble }, { HID_USB_DEVICE(USB_VENDOR_ID_THRUSTMASTER, 0xb324), /* Dual Trigger 3-in-1 (PS3 Mode) */
[ Upstream commit a07e3324538a989b7cdbf2c679be6a7f9df2544f ]
i8253 clocksource needs a free running timer. This could only be used, if i8253 clockevent is set up as periodic.
Signed-off-by: Thomas Bogendoerfer tbogendoerfer@suse.de Signed-off-by: Paul Burton paul.burton@mips.com Cc: Ralf Baechle ralf@linux-mips.org Cc: James Hogan jhogan@kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/kernel/i8253.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/mips/kernel/i8253.c b/arch/mips/kernel/i8253.c index 5f209f111e59e..df7ddd246eaac 100644 --- a/arch/mips/kernel/i8253.c +++ b/arch/mips/kernel/i8253.c @@ -32,7 +32,8 @@ void __init setup_pit_timer(void)
static int __init init_pit_clocksource(void) { - if (num_possible_cpus() > 1) /* PIT does not scale! */ + if (num_possible_cpus() > 1 || /* PIT does not scale! */ + !clockevent_state_periodic(&i8253_clockevent)) return 0;
return clocksource_i8253_init();
[ Upstream commit b8bea8a5e5d942e62203416ab41edecaed4fda02 ]
Because CONFIG_OF defined for MIPS, cacheinfo attempts to fill information from DT, ignoring data filled by architecture routine. This leads to error reported
cacheinfo: Unable to detect cache hierarchy for CPU 0
Way to fix this provided in commit fac51482577d ("drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled")
Utilize same mechanism to report that cacheinfo set by architecture specific function
Signed-off-by: Vladimir Kondratiev vladimir.kondratiev@linux.intel.com Signed-off-by: Paul Burton paul.burton@mips.com Cc: Ralf Baechle ralf@linux-mips.org Cc: James Hogan jhogan@kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/kernel/cacheinfo.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/mips/kernel/cacheinfo.c b/arch/mips/kernel/cacheinfo.c index 97d5239ca47ba..428ef21892039 100644 --- a/arch/mips/kernel/cacheinfo.c +++ b/arch/mips/kernel/cacheinfo.c @@ -80,6 +80,8 @@ static int __populate_cache_leaves(unsigned int cpu) if (c->tcache.waysize) populate_cache(tcache, this_leaf, 3, CACHE_TYPE_UNIFIED);
+ this_cpu_ci->cpu_map_populated = true; + return 0; }
[ Upstream commit 15a78ba1844a8e052c1226f930133de4cef4e7ad ]
In compat_do_replace(), a temporary buffer is allocated through vmalloc() to hold entries copied from the user space. The buffer address is firstly saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then the entries in this temporary buffer is copied to the internal kernel structure through compat_copy_entries(). If this copy process fails, compat_do_replace() should be terminated. However, the allocated temporary buffer is not freed on this path, leading to a memory leak.
To fix the bug, free the buffer before returning from compat_do_replace().
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/netfilter/ebtables.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index f9c6e8ca1fcb0..100b4f88179a2 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -2273,8 +2273,10 @@ static int compat_do_replace(struct net *net, void __user *user, state.buf_kern_len = size64;
ret = compat_copy_entries(entries_tmp, tmp.entries_size, &state); - if (WARN_ON(ret < 0)) + if (WARN_ON(ret < 0)) { + vfree(entries_tmp); goto out_unlock; + }
vfree(entries_tmp); tmp.entries_size = size64;
[ Upstream commit 8dd26dff00c0636b1d8621acaeef3f6f3a39dd77 ]
DPCM uses snd_soc_dapm_dai_get_connected_widgets to build a list of the widgets connected to a specific front end DAI so it can search through this list for available back end DAIs. The custom_stop_condition was added to is_connected_ep to facilitate this list not containing more widgets than is necessary. Doing so both speeds up the DPCM handling as less widgets need to be searched and avoids issues with CODEC to CODEC links as these would be confused with back end DAIs if they appeared in the list of available widgets.
custom_stop_condition was implemented by aborting the graph walk when the condition is triggered, however there is an issue with this approach. Whilst walking the graph is_connected_ep should update the endpoints cache on each widget, if the walk is aborted the number of attached end points is unknown for that sub-graph. When the stop condition triggered, the original patch ignored the triggering widget and returned zero connected end points; a later patch updated this to set the triggering widget's cache to 1 and return that. Both of these approaches result in inaccurate values being stored in various end point caches as the values propagate back through the graph, which can result in later issues with widgets powering/not powering unexpectedly.
As the original goal was to reduce the size of the widget list passed to the DPCM code, the simplest solution is to limit the functionality of the custom_stop_condition to the widget list. This means the rest of the graph will still be processed resulting in correct end point caches, but only widgets up to the stop condition will be added to the returned widget list.
Fixes: 6742064aef7f ("ASoC: dapm: support user-defined stop condition in dai_get_connected_widgets") Fixes: 5fdd022c2026 ("ASoC: dpcm: play nice with CODEC<->CODEC links") Fixes: 09464974eaa8 ("ASoC: dapm: Fix to return correct path list in is_connected_ep.") Signed-off-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/20190718084333.15598-1-ckeepax@opensource.cirrus.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-dapm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c index b4c8ba412a5c1..104d5f487c7d1 100644 --- a/sound/soc/soc-dapm.c +++ b/sound/soc/soc-dapm.c @@ -1152,8 +1152,8 @@ static __always_inline int is_connected_ep(struct snd_soc_dapm_widget *widget, list_add_tail(&widget->work_list, list);
if (custom_stop_condition && custom_stop_condition(widget, dir)) { - widget->endpoints[dir] = 1; - return widget->endpoints[dir]; + list = NULL; + custom_stop_condition = NULL; }
if ((widget->is_ep & SND_SOC_DAPM_DIR_TO_EP(dir)) && widget->connected) { @@ -1190,8 +1190,8 @@ static __always_inline int is_connected_ep(struct snd_soc_dapm_widget *widget, * * Optionally, can be supplied with a function acting as a stopping condition. * This function takes the dapm widget currently being examined and the walk - * direction as an arguments, it should return true if the walk should be - * stopped and false otherwise. + * direction as an arguments, it should return true if widgets from that point + * in the graph onwards should not be added to the widget list. */ static int is_connected_output_ep(struct snd_soc_dapm_widget *widget, struct list_head *list,
[ Upstream commit 12185dfe44360f814ac4ead9d22ad2af7511b2e9 ]
The following scenario was encountered during testing of logical partition mobility on pseries partitions with bonded ibmvnic adapters in LACP mode.
1. Driver receives a signal that the device has been swapped, and it needs to reset to initialize the new device.
2. Driver reports loss of carrier and begins initialization.
3. Bonding driver receives NETDEV_CHANGE notifier and checks the slave's current speed and duplex settings. Because these are unknown at the time, the bond sets its link state to BOND_LINK_FAIL and handles the speed update, clearing AD_PORT_LACP_ENABLE.
4. Driver finishes recovery and reports that the carrier is on.
5. Bond receives a new notification and checks the speed again. The speeds are valid but miimon has not altered the link state yet. AD_PORT_LACP_ENABLE remains off.
Because the slave's link state is still BOND_LINK_FAIL, no further port checks are made when it recovers. Though the slave devices are operational and have valid speed and duplex settings, the bond will not send LACPDU's. The simplest fix I can see is to force another speed check in bond_miimon_commit. This way the bond will update AD_PORT_LACP_ENABLE if needed when transitioning from BOND_LINK_FAIL to BOND_LINK_UP.
CC: Jarod Wilson jarod@redhat.com CC: Jay Vosburgh j.vosburgh@gmail.com CC: Veaceslav Falico vfalico@gmail.com CC: Andy Gospodarek andy@greyhouse.net Signed-off-by: Thomas Falcon tlfalcon@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 60d0c270af85a..c1eeba1906fdb 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2153,6 +2153,15 @@ static void bond_miimon_commit(struct bonding *bond) bond_for_each_slave(bond, slave, iter) { switch (slave->new_link) { case BOND_LINK_NOCHANGE: + /* For 802.3ad mode, check current slave speed and + * duplex again in case its port was disabled after + * invalid speed/duplex reporting but recovered before + * link monitoring could make a decision on the actual + * link status + */ + if (BOND_MODE(bond) == BOND_MODE_8023AD && + slave->link == BOND_LINK_UP) + bond_3ad_adapter_speed_duplex_changed(slave); continue;
case BOND_LINK_UP:
[ Upstream commit c63845609c4700488e5eacd6ab4d06d5d420e5ef ]
CONFIG_CAN_LEDS is deprecated. When trying to use the generic netdev trigger as suggested, there's a small inconsistency with the link property: The LED is on initially, stays on when the device is brought up, and then turns off (as expected) when the device is brought down.
Make sure the LED always reflects the state of the CAN device.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/dev.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index 7d61d8801220e..d92113db4fb97 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -1217,6 +1217,8 @@ int register_candev(struct net_device *dev) return -EINVAL;
dev->rtnl_link_ops = &can_link_ops; + netif_carrier_off(dev); + return register_netdev(dev); } EXPORT_SYMBOL_GPL(register_candev);
[ Upstream commit 40aa5383e393d72f6aa3943a4e7b1aae25a1e43b ]
If the DAI format setup fails, there is no valid communication format between CPU and CODEC, so fail card instantiation, rather than continue with a card that will most likely not function properly.
Signed-off-by: Ricard Wanderlof ricardw@axis.com Link: https://lore.kernel.org/r/alpine.DEB.2.20.1907241132350.6338@lnxricardw1.se.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index 42c2a3065b779..ff5206f5455d9 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -1757,8 +1757,11 @@ static int soc_probe_link_dais(struct snd_soc_card *card, } }
- if (dai_link->dai_fmt) - snd_soc_runtime_set_dai_fmt(rtd, dai_link->dai_fmt); + if (dai_link->dai_fmt) { + ret = snd_soc_runtime_set_dai_fmt(rtd, dai_link->dai_fmt); + if (ret) + return ret; + }
ret = soc_post_component_init(rtd, dai_link->name); if (ret)
[ Upstream commit 9891d06836e67324c9e9c4675ed90fc8b8110034 ]
devm_kzalloc may fail and return null. So the null check is needed.
Signed-off-by: Navid Emamdoost navid.emamdoost@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/st21nfca/se.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nfc/st21nfca/se.c b/drivers/nfc/st21nfca/se.c index 3a98563d4a121..eac608a457f03 100644 --- a/drivers/nfc/st21nfca/se.c +++ b/drivers/nfc/st21nfca/se.c @@ -326,6 +326,8 @@ int st21nfca_connectivity_event_received(struct nfc_hci_dev *hdev, u8 host,
transaction = (struct nfc_evt_transaction *)devm_kzalloc(dev, skb->len - 2, GFP_KERNEL); + if (!transaction) + return -ENOMEM;
transaction->aid_len = skb->data[1]; memcpy(transaction->aid, &skb->data[2],
[ Upstream commit 3008e06fdf0973770370f97d5f1fba3701d8281d ]
devm_kzalloc may fail and return NULL. So the null check is needed.
Signed-off-by: Navid Emamdoost navid.emamdoost@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/st-nci/se.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nfc/st-nci/se.c b/drivers/nfc/st-nci/se.c index 56f2112e0cd84..85df2e0093109 100644 --- a/drivers/nfc/st-nci/se.c +++ b/drivers/nfc/st-nci/se.c @@ -344,6 +344,8 @@ static int st_nci_hci_connectivity_event_received(struct nci_dev *ndev,
transaction = (struct nfc_evt_transaction *)devm_kzalloc(dev, skb->len - 2, GFP_KERNEL); + if (!transaction) + return -ENOMEM;
transaction->aid_len = skb->data[1]; memcpy(transaction->aid, &skb->data[2], transaction->aid_len);
[ Upstream commit 1e112c35e3c96db7c8ca6ddaa96574f00c06e7db ]
The slot_width is a property for the bus while the constraint for SNDRV_PCM_HW_PARAM_SAMPLE_BITS is for the in memory format.
Applying slot_width constraint to sample_bits works most of the time, but it will blacklist valid formats in some cases.
With slot_width 24 we can support S24_3LE and S24_LE formats as they both look the same on the bus, but a a 24 constraint on sample_bits would not allow S24_LE as it is stored in 32bits in memory.
Implement a simple hw_rule function to allow all formats which require less or equal number of bits on the bus as slot_width (if configured).
Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Link: https://lore.kernel.org/r/20190726064244.3762-2-peter.ujfalusi@ti.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/davinci/davinci-mcasp.c | 43 ++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 9 deletions(-)
diff --git a/sound/soc/davinci/davinci-mcasp.c b/sound/soc/davinci/davinci-mcasp.c index 9aa741d272798..0480ec4c80354 100644 --- a/sound/soc/davinci/davinci-mcasp.c +++ b/sound/soc/davinci/davinci-mcasp.c @@ -1158,6 +1158,28 @@ static int davinci_mcasp_trigger(struct snd_pcm_substream *substream, return ret; }
+static int davinci_mcasp_hw_rule_slot_width(struct snd_pcm_hw_params *params, + struct snd_pcm_hw_rule *rule) +{ + struct davinci_mcasp_ruledata *rd = rule->private; + struct snd_mask *fmt = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT); + struct snd_mask nfmt; + int i, slot_width; + + snd_mask_none(&nfmt); + slot_width = rd->mcasp->slot_width; + + for (i = 0; i <= SNDRV_PCM_FORMAT_LAST; i++) { + if (snd_mask_test(fmt, i)) { + if (snd_pcm_format_width(i) <= slot_width) { + snd_mask_set(&nfmt, i); + } + } + } + + return snd_mask_refine(fmt, &nfmt); +} + static const unsigned int davinci_mcasp_dai_rates[] = { 8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200, 96000, 176400, 192000, @@ -1251,7 +1273,7 @@ static int davinci_mcasp_startup(struct snd_pcm_substream *substream, struct davinci_mcasp_ruledata *ruledata = &mcasp->ruledata[substream->stream]; u32 max_channels = 0; - int i, dir; + int i, dir, ret; int tdm_slots = mcasp->tdm_slots;
/* Do not allow more then one stream per direction */ @@ -1280,6 +1302,7 @@ static int davinci_mcasp_startup(struct snd_pcm_substream *substream, max_channels++; } ruledata->serializers = max_channels; + ruledata->mcasp = mcasp; max_channels *= tdm_slots; /* * If the already active stream has less channels than the calculated @@ -1305,20 +1328,22 @@ static int davinci_mcasp_startup(struct snd_pcm_substream *substream, 0, SNDRV_PCM_HW_PARAM_CHANNELS, &mcasp->chconstr[substream->stream]);
- if (mcasp->slot_width) - snd_pcm_hw_constraint_minmax(substream->runtime, - SNDRV_PCM_HW_PARAM_SAMPLE_BITS, - 8, mcasp->slot_width); + if (mcasp->slot_width) { + /* Only allow formats require <= slot_width bits on the bus */ + ret = snd_pcm_hw_rule_add(substream->runtime, 0, + SNDRV_PCM_HW_PARAM_FORMAT, + davinci_mcasp_hw_rule_slot_width, + ruledata, + SNDRV_PCM_HW_PARAM_FORMAT, -1); + if (ret) + return ret; + }
/* * If we rely on implicit BCLK divider setting we should * set constraints based on what we can provide. */ if (mcasp->bclk_master && mcasp->bclk_div == 0 && mcasp->sysclk_freq) { - int ret; - - ruledata->mcasp = mcasp; - ret = snd_pcm_hw_rule_add(substream->runtime, 0, SNDRV_PCM_HW_PARAM_RATE, davinci_mcasp_hw_rule_rate,
[ Upstream commit 9a07406b00cdc6ec689dc142540739575c717f3c ]
The BroadMobi BM818 M.2 card uses the QMI protocol
Signed-off-by: Bob Ham bob.ham@puri.sm Signed-off-by: Angus Ainslie (Purism) angus@akkea.ca Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 4b0144b2a2523..e2050afaab7a8 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1220,6 +1220,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x2001, 0x7e35, 4)}, /* D-Link DWM-222 */ {QMI_FIXED_INTF(0x2020, 0x2031, 4)}, /* Olicard 600 */ {QMI_FIXED_INTF(0x2020, 0x2033, 4)}, /* BroadMobi BM806U */ + {QMI_FIXED_INTF(0x2020, 0x2060, 4)}, /* BroadMobi BM818 */ {QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)}, /* Sierra Wireless MC7700 */ {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */ {QMI_FIXED_INTF(0x1199, 0x68a2, 8)}, /* Sierra Wireless MC7710 in QMI mode */
[ Upstream commit 81af04b432fdfabcdbd2c06be2ee647e3ca41a22 ]
The hw_ver field was initialized to zero. Return the chip revision. This is relevant for rdma driver.
Signed-off-by: Michal Kalderon michal.kalderon@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qed/qed_rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c index 1e13dea66989e..c9258aabca2d4 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c +++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c @@ -398,7 +398,7 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn, /* Vendor specific information */ dev->vendor_id = cdev->vendor_id; dev->vendor_part_id = cdev->device_id; - dev->hw_ver = 0; + dev->hw_ver = cdev->chip_rev; dev->fw_ver = (FW_MAJOR_VERSION << 24) | (FW_MINOR_VERSION << 16) | (FW_REVISION_VERSION << 8) | (FW_ENGINEERING_VERSION);
[ Upstream commit a0d57a552b836206ad7705a1060e6e1ce5a38203 ]
In start_isoc_chain(), usb_alloc_urb() on line 1392 may fail and return NULL. At this time, fifo->iso[i].urb is assigned to NULL.
Then, fifo->iso[i].urb is used at some places, such as: LINE 1405: fill_isoc_urb(fifo->iso[i].urb, ...) urb->number_of_packets = num_packets; urb->transfer_flags = URB_ISO_ASAP; urb->actual_length = 0; urb->interval = interval; LINE 1416: fifo->iso[i].urb->... LINE 1419: fifo->iso[i].urb->...
Thus, possible null-pointer dereferences may occur.
To fix these bugs, "continue" is added to avoid using fifo->iso[i].urb when it is NULL.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/hardware/mISDN/hfcsusb.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c index 35983c7c31375..163bc482b2a78 100644 --- a/drivers/isdn/hardware/mISDN/hfcsusb.c +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c @@ -1402,6 +1402,7 @@ start_isoc_chain(struct usb_fifo *fifo, int num_packets_per_urb, printk(KERN_DEBUG "%s: %s: alloc urb for fifo %i failed", hw->name, __func__, fifo->fifonum); + continue; } fifo->iso[i].owner_fifo = (struct usb_fifo *) fifo; fifo->iso[i].indx = i;
[ Upstream commit 6c1f7e2c1b96ab9b09ac97c4df2bd9dc327206f6 ]
Shijie Luo reported that when stress-testing ipset with multiple concurrent create, rename, flush, list, destroy commands, it can result
ipset <version>: Broken LIST kernel message: missing DATA part!
error messages and broken list results. The problem was the rename operation was not properly handled with respect of listing. The patch fixes the issue.
Reported-by: Shijie Luo luoshijie1@huawei.com Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipset/ip_set_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index a3f1dc7cf5382..dbf17d3596a69 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1128,7 +1128,7 @@ static int ip_set_rename(struct net *net, struct sock *ctnl, return -ENOENT;
write_lock_bh(&ip_set_ref_lock); - if (set->ref != 0) { + if (set->ref != 0 || set->ref_netlink != 0) { ret = -IPSET_ERR_REFERENCED; goto out; }
[ Upstream commit d8a1de3d5bb881507602bc02e004904828f88711 ]
Since linux 4.9 it is not possible to use buffers on the stack for DMA transfers.
During usb probe the driver crashes with "transfer buffer is on stack" message.
This fix k-allocates a buffer to be used on "read_reg_atomic", which is a macro that calls "usb_control_msg" under the hood.
Kernel 4.19 backtrace:
usb_hcd_submit_urb+0x3e5/0x900 ? sched_clock+0x9/0x10 ? log_store+0x203/0x270 ? get_random_u32+0x6f/0x90 ? cache_alloc_refill+0x784/0x8a0 usb_submit_urb+0x3b4/0x550 usb_start_wait_urb+0x4e/0xd0 usb_control_msg+0xb8/0x120 hfcsusb_probe+0x6bc/0xb40 [hfcsusb] usb_probe_interface+0xc2/0x260 really_probe+0x176/0x280 driver_probe_device+0x49/0x130 __driver_attach+0xa9/0xb0 ? driver_probe_device+0x130/0x130 bus_for_each_dev+0x5a/0x90 driver_attach+0x14/0x20 ? driver_probe_device+0x130/0x130 bus_add_driver+0x157/0x1e0 driver_register+0x51/0xe0 usb_register_driver+0x5d/0x120 ? 0xf81ed000 hfcsusb_drv_init+0x17/0x1000 [hfcsusb] do_one_initcall+0x44/0x190 ? free_unref_page_commit+0x6a/0xd0 do_init_module+0x46/0x1c0 load_module+0x1dc1/0x2400 sys_init_module+0xed/0x120 do_fast_syscall_32+0x7a/0x200 entry_SYSENTER_32+0x6b/0xbe
Signed-off-by: Juliana Rodrigueiro juliana.rodrigueiro@intra2net.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/hardware/mISDN/hfcsusb.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c index 163bc482b2a78..87588198d68fc 100644 --- a/drivers/isdn/hardware/mISDN/hfcsusb.c +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c @@ -1701,13 +1701,23 @@ hfcsusb_stop_endpoint(struct hfcsusb *hw, int channel) static int setup_hfcsusb(struct hfcsusb *hw) { + void *dmabuf = kmalloc(sizeof(u_char), GFP_KERNEL); u_char b; + int ret;
if (debug & DBG_HFC_CALL_TRACE) printk(KERN_DEBUG "%s: %s\n", hw->name, __func__);
+ if (!dmabuf) + return -ENOMEM; + + ret = read_reg_atomic(hw, HFCUSB_CHIP_ID, dmabuf); + + memcpy(&b, dmabuf, sizeof(u_char)); + kfree(dmabuf); + /* check the chip id */ - if (read_reg_atomic(hw, HFCUSB_CHIP_ID, &b) != 1) { + if (ret != 1) { printk(KERN_DEBUG "%s: %s: cannot read chip id\n", hw->name, __func__); return 1;
[ Upstream commit 6bbfe4e602691b90ac866712bd4c43c51e546a60 ]
Michael reported an issue with perf bench numa failing with binding to cpu0 with '-0' option.
# perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd # Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd" binding to node 0, mask: 0000000000000001 => -1 perf: bench/numa.c:356: bind_to_memnode: Assertion `!(ret)' failed. Aborted (core dumped)
This happens when the cpu0 is not part of node0, which is the benchmark assumption and we can see that's not the case for some powerpc servers.
Using correct node for cpu0 binding.
Reported-by: Michael Petlan mpetlan@redhat.com Signed-off-by: Jiri Olsa jolsa@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Satheesh Rajendran sathnaga@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20190801142642.28004-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/bench/numa.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c index 997875c770b10..275f1c3c73b62 100644 --- a/tools/perf/bench/numa.c +++ b/tools/perf/bench/numa.c @@ -378,8 +378,10 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
/* Allocate and initialize all memory on CPU#0: */ if (init_cpu0) { - orig_mask = bind_to_node(0); - bind_to_memnode(0); + int node = numa_node_of_cpu(0); + + orig_mask = bind_to_node(node); + bind_to_memnode(node); }
bytes = bytes0 + HPSIZE;
[ Upstream commit cd28aa2e056cd1ea79fc5f24eed0ce868c6cab5c ]
strncpy() does not ensure NULL-termination when the input string size equals to the destination buffer size IFNAMSIZ. The output string 'name' is passed to dev_info which relies on NULL-termination.
Use strlcpy() instead.
This issue is identified by a Coccinelle script.
Signed-off-by: Wang Xiayang xywang.sjtu@sjtu.edu.cn Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/sja1000/peak_pcmcia.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/sja1000/peak_pcmcia.c b/drivers/net/can/sja1000/peak_pcmcia.c index dd56133cc4616..fc9f8b01ecae2 100644 --- a/drivers/net/can/sja1000/peak_pcmcia.c +++ b/drivers/net/can/sja1000/peak_pcmcia.c @@ -487,7 +487,7 @@ static void pcan_free_channels(struct pcan_pccard *card) if (!netdev) continue;
- strncpy(name, netdev->name, IFNAMSIZ); + strlcpy(name, netdev->name, IFNAMSIZ);
unregister_sja1000dev(netdev);
[ Upstream commit e787f19373b8a5fa24087800ed78314fd17b984a ]
strncpy() does not ensure NULL-termination when the input string size equals to the destination buffer size IFNAMSIZ. The output string is passed to dev_info() which relies on the NULL-termination.
Use strlcpy() instead.
This issue is identified by a Coccinelle script.
Signed-off-by: Wang Xiayang xywang.sjtu@sjtu.edu.cn Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/usb/peak_usb/pcan_usb_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index d68c79f9a4b95..059282a6065c6 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -881,7 +881,7 @@ static void peak_usb_disconnect(struct usb_interface *intf)
dev_prev_siblings = dev->prev_siblings; dev->state &= ~PCAN_USB_STATE_CONNECTED; - strncpy(name, netdev->name, IFNAMSIZ); + strlcpy(name, netdev->name, IFNAMSIZ);
unregister_netdev(netdev);
[ Upstream commit 3690c8c9a8edff0db077a38783112d8fe12a7dd2 ]
strncpy() does not ensure NULL-termination when the input string size equals to the destination buffer size 30. The output string is passed to qed_int_deassertion_aeu_bit() which calls DP_INFO() and relies NULL-termination.
Use strlcpy instead. The other conditional branch above strncpy() needs no fix as snprintf() ensures NULL-termination.
This issue is identified by a Coccinelle script.
Signed-off-by: Wang Xiayang xywang.sjtu@sjtu.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qed/qed_int.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c index 7746417130bd7..c5d9f290ec4c7 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_int.c +++ b/drivers/net/ethernet/qlogic/qed/qed_int.c @@ -939,7 +939,7 @@ static int qed_int_deassertion(struct qed_hwfn *p_hwfn, snprintf(bit_name, 30, p_aeu->bit_name, num); else - strncpy(bit_name, + strlcpy(bit_name, p_aeu->bit_name, 30);
/* We now need to pass bitmask in its
[ Upstream commit c77e22834ae9a11891cb613bd9a551be1b94f2bc ]
John Hubbard reports seeing the following stack trace:
nfs4_do_reclaim rcu_read_lock /* we are now in_atomic() and must not sleep */ nfs4_purge_state_owners nfs4_free_state_owner nfs4_destroy_seqid_counter rpc_destroy_wait_queue cancel_delayed_work_sync __cancel_work_timer __flush_work start_flush_work might_sleep: (kernel/workqueue.c:2975: BUG)
The solution is to separate out the freeing of the state owners from nfs4_purge_state_owners(), and perform that outside the atomic context.
Reported-by: John Hubbard jhubbard@nvidia.com Fixes: 0aaaf5c424c7f ("NFS: Cache state owners after files are closed") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4_fs.h | 3 ++- fs/nfs/nfs4client.c | 5 ++++- fs/nfs/nfs4state.c | 27 ++++++++++++++++++++++----- 3 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index a73144b3cb8c8..22cff39cca29a 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -433,7 +433,8 @@ static inline void nfs4_schedule_session_recovery(struct nfs4_session *session,
extern struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *, struct rpc_cred *, gfp_t); extern void nfs4_put_state_owner(struct nfs4_state_owner *); -extern void nfs4_purge_state_owners(struct nfs_server *); +extern void nfs4_purge_state_owners(struct nfs_server *, struct list_head *); +extern void nfs4_free_state_owners(struct list_head *head); extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *); extern void nfs4_put_open_state(struct nfs4_state *); extern void nfs4_close_state(struct nfs4_state *, fmode_t); diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 8f96f6548dc82..0924b68b56574 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -739,9 +739,12 @@ out:
static void nfs4_destroy_server(struct nfs_server *server) { + LIST_HEAD(freeme); + nfs_server_return_all_delegations(server); unset_pnfs_layoutdriver(server); - nfs4_purge_state_owners(server); + nfs4_purge_state_owners(server, &freeme); + nfs4_free_state_owners(&freeme); }
/* diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 85ec07e4aa91b..f92bfc787c5fe 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -614,24 +614,39 @@ void nfs4_put_state_owner(struct nfs4_state_owner *sp) /** * nfs4_purge_state_owners - Release all cached state owners * @server: nfs_server with cached state owners to release + * @head: resulting list of state owners * * Called at umount time. Remaining state owners will be on * the LRU with ref count of zero. + * Note that the state owners are not freed, but are added + * to the list @head, which can later be used as an argument + * to nfs4_free_state_owners. */ -void nfs4_purge_state_owners(struct nfs_server *server) +void nfs4_purge_state_owners(struct nfs_server *server, struct list_head *head) { struct nfs_client *clp = server->nfs_client; struct nfs4_state_owner *sp, *tmp; - LIST_HEAD(doomed);
spin_lock(&clp->cl_lock); list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) { - list_move(&sp->so_lru, &doomed); + list_move(&sp->so_lru, head); nfs4_remove_state_owner_locked(sp); } spin_unlock(&clp->cl_lock); +}
- list_for_each_entry_safe(sp, tmp, &doomed, so_lru) { +/** + * nfs4_purge_state_owners - Release all cached state owners + * @head: resulting list of state owners + * + * Frees a list of state owners that was generated by + * nfs4_purge_state_owners + */ +void nfs4_free_state_owners(struct list_head *head) +{ + struct nfs4_state_owner *sp, *tmp; + + list_for_each_entry_safe(sp, tmp, head, so_lru) { list_del(&sp->so_lru); nfs4_free_state_owner(sp); } @@ -1782,12 +1797,13 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov struct nfs4_state_owner *sp; struct nfs_server *server; struct rb_node *pos; + LIST_HEAD(freeme); int status = 0;
restart: rcu_read_lock(); list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) { - nfs4_purge_state_owners(server); + nfs4_purge_state_owners(server, &freeme); spin_lock(&clp->cl_lock); for (pos = rb_first(&server->state_owners); pos != NULL; @@ -1816,6 +1832,7 @@ restart: spin_unlock(&clp->cl_lock); } rcu_read_unlock(); + nfs4_free_state_owners(&freeme); return 0; }
[ Upstream commit 1c703b53e5bfb5c2205c30f0fb157ce271fd42fb ]
Some a4tech mice use the 'GenericDesktop.00b8' usage to inform whether the previous wheel report was horizontal or vertical. Before c01908a14bf73 ("HID: input: add mapping for "Toggle Display" key") this usage was being mapped to 'Relative.Misc'. After the patch it's simply ignored (usage->type == 0 & usage->code == 0). Which ultimately makes hid-a4tech ignore the WHEEL/HWHEEL selection event, as it has no usage->type.
We shouldn't rely on a mapping for that usage as it's nonstandard and doesn't really map to an input event. So we bypass the mapping and make sure the custom event handling properly handles both reports.
Fixes: c01908a14bf73 ("HID: input: add mapping for "Toggle Display" key") Signed-off-by: Nicolas Saenz Julienne nsaenzjulienne@suse.de Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-a4tech.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-)
diff --git a/drivers/hid/hid-a4tech.c b/drivers/hid/hid-a4tech.c index 9428ea7cdf8a0..c52bd163abb3e 100644 --- a/drivers/hid/hid-a4tech.c +++ b/drivers/hid/hid-a4tech.c @@ -26,12 +26,36 @@ #define A4_2WHEEL_MOUSE_HACK_7 0x01 #define A4_2WHEEL_MOUSE_HACK_B8 0x02
+#define A4_WHEEL_ORIENTATION (HID_UP_GENDESK | 0x000000b8) + struct a4tech_sc { unsigned long quirks; unsigned int hw_wheel; __s32 delayed_value; };
+static int a4_input_mapping(struct hid_device *hdev, struct hid_input *hi, + struct hid_field *field, struct hid_usage *usage, + unsigned long **bit, int *max) +{ + struct a4tech_sc *a4 = hid_get_drvdata(hdev); + + if (a4->quirks & A4_2WHEEL_MOUSE_HACK_B8 && + usage->hid == A4_WHEEL_ORIENTATION) { + /* + * We do not want to have this usage mapped to anything as it's + * nonstandard and doesn't really behave like an HID report. + * It's only selecting the orientation (vertical/horizontal) of + * the previous mouse wheel report. The input_events will be + * generated once both reports are recorded in a4_event(). + */ + return -1; + } + + return 0; + +} + static int a4_input_mapped(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) @@ -53,8 +77,7 @@ static int a4_event(struct hid_device *hdev, struct hid_field *field, struct a4tech_sc *a4 = hid_get_drvdata(hdev); struct input_dev *input;
- if (!(hdev->claimed & HID_CLAIMED_INPUT) || !field->hidinput || - !usage->type) + if (!(hdev->claimed & HID_CLAIMED_INPUT) || !field->hidinput) return 0;
input = field->hidinput->input; @@ -65,7 +88,7 @@ static int a4_event(struct hid_device *hdev, struct hid_field *field, return 1; }
- if (usage->hid == 0x000100b8) { + if (usage->hid == A4_WHEEL_ORIENTATION) { input_event(input, EV_REL, value ? REL_HWHEEL : REL_WHEEL, a4->delayed_value); return 1; @@ -129,6 +152,7 @@ MODULE_DEVICE_TABLE(hid, a4_devices); static struct hid_driver a4_driver = { .name = "a4tech", .id_table = a4_devices, + .input_mapping = a4_input_mapping, .input_mapped = a4_input_mapped, .event = a4_event, .probe = a4_probe,
[ Upstream commit ee9d66182392695535cc9fccfcb40c16f72de2a9 ]
Fix kernel oops when mounting a encryptData CIFS share with CONFIG_DEBUG_VIRTUAL
Signed-off-by: Sebastien Tisserant stisserant@wallix.com Reviewed-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/smb2ops.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 23326b0cd5628..58a502e622aa4 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -2168,7 +2168,15 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, struct smb_rqst *old_rq) static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { - sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); + void *addr; + /* + * VMAP_STACK (at least) puts stack into the vmalloc address space + */ + if (is_vmalloc_addr(buf)) + addr = vmalloc_to_page(buf); + else + addr = virt_to_page(buf); + sg_set_page(sg, addr, buflen, offset_in_page(buf)); }
static struct scatterlist *
[ Upstream commit debea2cd3193ac868289e8893c3a719c265b0612 ]
A call to 'kfree_skb()' is missing in the error handling path of 'init_one()'. This is already present in 'remove_one()' but is missing here.
Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c index 79053d2ce7a36..338683e5ef1e8 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -3270,7 +3270,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (!adapter->regs) { dev_err(&pdev->dev, "cannot map device registers\n"); err = -ENOMEM; - goto out_free_adapter; + goto out_free_adapter_nofail; }
adapter->pdev = pdev; @@ -3390,6 +3390,9 @@ out_free_dev: if (adapter->port[i]) free_netdev(adapter->port[i]);
+out_free_adapter_nofail: + kfree_skb(adapter->nofail_skb); + out_free_adapter: kfree(adapter);
[ Upstream commit 1a2c070ae805910a853b4a14818481ed2e17c727 ]
If hip04_tx_reclaim is interrupted while it is running and then __napi_schedule continues to execute hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs and oops is generated. So you need to mask the interrupt during the hip04_tx_reclaim run.
The kernel oops exception stack is as follows:
Unable to handle kernel NULL pointer dereference at virtual address 00000050 pgd = c0003000 [00000050] *pgd=80000000a04003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP ARM Modules linked in: hip04_eth mtdblock mtd_blkdevs mtd ohci_platform ehci_platform ohci_hcd ehci_hcd vfat fat sd_mod usb_storage scsi_mod usbcore usb_common CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.4.185 #1 Hardware name: Hisilicon A15 task: c0a250e0 task.stack: c0a00000 PC is at hip04_tx_reclaim+0xe0/0x17c [hip04_eth] LR is at hip04_tx_reclaim+0x30/0x17c [hip04_eth] pc : [<bf30c3a4>] lr : [<bf30c2f4>] psr: 600e0313 sp : c0a01d88 ip : 00000000 fp : c0601f9c r10: 00000000 r9 : c3482380 r8 : 00000001 r7 : 00000000 r6 : 000000e1 r5 : c3482000 r4 : 0000000c r3 : f2209800 r2 : 00000000 r1 : 00000000 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 32c5387d Table: 03d28c80 DAC: 55555555 Process swapper/0 (pid: 0, stack limit = 0xc0a00190) Stack: (0xc0a01d88 to 0xc0a02000) [<bf30c3a4>] (hip04_tx_reclaim [hip04_eth]) from [<bf30d2e0>] (hip04_rx_poll+0x88/0x368 [hip04_eth]) [<bf30d2e0>] (hip04_rx_poll [hip04_eth]) from [<c04c2d9c>] (net_rx_action+0x114/0x34c) [<c04c2d9c>] (net_rx_action) from [<c021eed8>] (__do_softirq+0x218/0x318) [<c021eed8>] (__do_softirq) from [<c021f284>] (irq_exit+0x88/0xac) [<c021f284>] (irq_exit) from [<c0240090>] (msa_irq_exit+0x11c/0x1d4) [<c0240090>] (msa_irq_exit) from [<c02677e0>] (__handle_domain_irq+0x110/0x148) [<c02677e0>] (__handle_domain_irq) from [<c0201588>] (gic_handle_irq+0xd4/0x118) [<c0201588>] (gic_handle_irq) from [<c0551700>] (__irq_svc+0x40/0x58) Exception stack(0xc0a01f30 to 0xc0a01f78) 1f20: c0ae8b40 00000000 00000000 00000000 1f40: 00000002 ffffe000 c0601f9c 00000000 ffffffff c0a2257c c0a22440 c0831a38 1f60: c0a01ec4 c0a01f80 c0203714 c0203718 600e0213 ffffffff [<c0551700>] (__irq_svc) from [<c0203718>] (arch_cpu_idle+0x20/0x3c) [<c0203718>] (arch_cpu_idle) from [<c025bfd8>] (cpu_startup_entry+0x244/0x29c) [<c025bfd8>] (cpu_startup_entry) from [<c054b0d8>] (rest_init+0xc8/0x10c) [<c054b0d8>] (rest_init) from [<c0800c58>] (start_kernel+0x468/0x514) Code: a40599e5 016086e2 018088e2 7660efe6 (503090e5) ---[ end trace 1db21d6d09c49d74 ]--- Kernel panic - not syncing: Fatal exception in interrupt CPU3: stopping CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D O 4.4.185 #1
Signed-off-by: Jiangfeng Xiao xiaojiangfeng@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hip04_eth.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index c27054b8ce81b..60ef6d40e4896 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -497,6 +497,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) u16 len; u32 err;
+ /* clean up tx descriptors */ + tx_remaining = hip04_tx_reclaim(ndev, false); + while (cnt && !last) { buf = priv->rx_buf[priv->rx_head]; skb = build_skb(buf, priv->rx_buf_size); @@ -557,8 +560,7 @@ refill: } napi_complete_done(napi, rx); done: - /* clean up tx descriptors and start a new timer if necessary */ - tx_remaining = hip04_tx_reclaim(ndev, false); + /* start a new timer if necessary */ if (rx < budget && tx_remaining) hip04_start_tx_timer(priv);
[ Upstream commit f2243b82785942be519016067ee6c55a063bbfe2 ]
TX_DESC_NUM is 256, in tx_count, the maximum value of mod(TX_DESC_NUM - 1) is 254, the variable "count" in the hip04_mac_start_xmit function is never equal to (TX_DESC_NUM - 1), so hip04_mac_start_xmit never return NETDEV_TX_BUSY.
tx_count is modified to mod(TX_DESC_NUM) so that the maximum value of tx_count can reach (TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn NETDEV_TX_BUSY.
Signed-off-by: Jiangfeng Xiao xiaojiangfeng@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hip04_eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 60ef6d40e4896..b04fb82d7fa3e 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -185,7 +185,7 @@ struct hip04_priv {
static inline unsigned int tx_count(unsigned int head, unsigned int tail) { - return (head - tail) % (TX_DESC_NUM - 1); + return (head - tail) % TX_DESC_NUM; }
static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
[ Upstream commit 96a50c0d907ac8f5c3d6b051031a19eb8a2b53e3 ]
On the arm64 platform, executing "ifconfig eth0 up" will fail, returning "ifconfig: SIOCSIFFLAGS: Input/output error."
ndev->dev is not initialized, dma_map_single->get_dma_ops-> dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE directly, so when we use dma_map_single, the first parameter is to use the device of platform_device.
Signed-off-by: Jiangfeng Xiao xiaojiangfeng@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index b04fb82d7fa3e..1bfe9544b3c10 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -157,6 +157,7 @@ struct hip04_priv { unsigned int reg_inten;
struct napi_struct napi; + struct device *dev; struct net_device *ndev;
struct tx_desc *tx_desc; @@ -387,7 +388,7 @@ static int hip04_tx_reclaim(struct net_device *ndev, bool force) }
if (priv->tx_phys[tx_tail]) { - dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail], + dma_unmap_single(priv->dev, priv->tx_phys[tx_tail], priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE); priv->tx_phys[tx_tail] = 0; @@ -437,8 +438,8 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev) return NETDEV_TX_BUSY; }
- phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE); - if (dma_mapping_error(&ndev->dev, phys)) { + phys = dma_map_single(priv->dev, skb->data, skb->len, DMA_TO_DEVICE); + if (dma_mapping_error(priv->dev, phys)) { dev_kfree_skb(skb); return NETDEV_TX_OK; } @@ -508,7 +509,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) goto refill; }
- dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head], + dma_unmap_single(priv->dev, priv->rx_phys[priv->rx_head], RX_BUF_SIZE, DMA_FROM_DEVICE); priv->rx_phys[priv->rx_head] = 0;
@@ -537,9 +538,9 @@ refill: buf = netdev_alloc_frag(priv->rx_buf_size); if (!buf) goto done; - phys = dma_map_single(&ndev->dev, buf, + phys = dma_map_single(priv->dev, buf, RX_BUF_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(&ndev->dev, phys)) + if (dma_mapping_error(priv->dev, phys)) goto done; priv->rx_buf[priv->rx_head] = buf; priv->rx_phys[priv->rx_head] = phys; @@ -642,9 +643,9 @@ static int hip04_mac_open(struct net_device *ndev) for (i = 0; i < RX_DESC_NUM; i++) { dma_addr_t phys;
- phys = dma_map_single(&ndev->dev, priv->rx_buf[i], + phys = dma_map_single(priv->dev, priv->rx_buf[i], RX_BUF_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(&ndev->dev, phys)) + if (dma_mapping_error(priv->dev, phys)) return -EIO;
priv->rx_phys[i] = phys; @@ -678,7 +679,7 @@ static int hip04_mac_stop(struct net_device *ndev)
for (i = 0; i < RX_DESC_NUM; i++) { if (priv->rx_phys[i]) { - dma_unmap_single(&ndev->dev, priv->rx_phys[i], + dma_unmap_single(priv->dev, priv->rx_phys[i], RX_BUF_SIZE, DMA_FROM_DEVICE); priv->rx_phys[i] = 0; } @@ -822,6 +823,7 @@ static int hip04_mac_probe(struct platform_device *pdev) return -ENOMEM;
priv = netdev_priv(ndev); + priv->dev = d; priv->ndev = ndev; platform_set_drvdata(pdev, ndev); SET_NETDEV_DEV(ndev, &pdev->dev);
[ Upstream commit 2d7271501720038381d45fb3dcbe4831228fc8cc ]
For passthrough requests, libata-scsi takes what the user passes in as gospel. This can be problematic if the user fills in the CDB incorrectly. One example of that is in request sizes. For read/write commands, the CDB contains fields describing the transfer length of the request. These should match with the SG_IO header fields, but libata-scsi currently does no validation of that.
Check that the number of blocks in the CDB for passthrough requests matches what was mapped into the request. If the CDB asks for more data then the validated SG_IO header fields, error it.
Reported-by: Krishna Ram Prakash R krp@gtux.in Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/libata-scsi.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index bf5777bc04d35..eb0c4ee205258 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1804,6 +1804,21 @@ nothing_to_do: return 1; }
+static bool ata_check_nblocks(struct scsi_cmnd *scmd, u32 n_blocks) +{ + struct request *rq = scmd->request; + u32 req_blocks; + + if (!blk_rq_is_passthrough(rq)) + return true; + + req_blocks = blk_rq_bytes(rq) / scmd->device->sector_size; + if (n_blocks > req_blocks) + return false; + + return true; +} + /** * ata_scsi_rw_xlat - Translate SCSI r/w command into an ATA one * @qc: Storage for translated ATA taskfile @@ -1848,6 +1863,8 @@ static unsigned int ata_scsi_rw_xlat(struct ata_queued_cmd *qc) scsi_10_lba_len(cdb, &block, &n_block); if (cdb[1] & (1 << 3)) tf_flags |= ATA_TFLAG_FUA; + if (!ata_check_nblocks(scmd, n_block)) + goto invalid_fld; break; case READ_6: case WRITE_6: @@ -1862,6 +1879,8 @@ static unsigned int ata_scsi_rw_xlat(struct ata_queued_cmd *qc) */ if (!n_block) n_block = 256; + if (!ata_check_nblocks(scmd, n_block)) + goto invalid_fld; break; case READ_16: case WRITE_16: @@ -1872,6 +1891,8 @@ static unsigned int ata_scsi_rw_xlat(struct ata_queued_cmd *qc) scsi_16_lba_len(cdb, &block, &n_block); if (cdb[1] & (1 << 3)) tf_flags |= ATA_TFLAG_FUA; + if (!ata_check_nblocks(scmd, n_block)) + goto invalid_fld; break; default: DPRINTK("no-byte command\n");
[ Upstream commit 752ead44491e8c91e14d7079625c5916b30921c5 ]
Abort processing of a command if we run out of mapped data in the SG list. This should never happen, but a previous bug caused it to be possible. Play it safe and attempt to abort nicely if we don't have more SG segments left.
Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/libata-sff.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index cc2f2e35f4c2e..8c36ff0c2dd49 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -704,6 +704,10 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) unsigned int offset; unsigned char *buf;
+ if (!qc->cursg) { + qc->curbytes = qc->nbytes; + return; + } if (qc->curbytes == qc->nbytes - qc->sect_size) ap->hsm_task_state = HSM_ST_LAST;
@@ -729,6 +733,8 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
if (qc->cursg_ofs == qc->cursg->length) { qc->cursg = sg_next(qc->cursg); + if (!qc->cursg) + ap->hsm_task_state = HSM_ST_LAST; qc->cursg_ofs = 0; } }
[ Upstream commit 04f5bda84b0712d6f172556a7e8dca9ded5e73b9 ]
When building with W=1, warnings about missing prototypes are emitted:
CC arch/x86/lib/cpu.o arch/x86/lib/cpu.c:5:14: warning: no previous prototype for 'x86_family' [-Wmissing-prototypes] 5 | unsigned int x86_family(unsigned int sig) | ^~~~~~~~~~ arch/x86/lib/cpu.c:18:14: warning: no previous prototype for 'x86_model' [-Wmissing-prototypes] 18 | unsigned int x86_model(unsigned int sig) | ^~~~~~~~~ arch/x86/lib/cpu.c:33:14: warning: no previous prototype for 'x86_stepping' [-Wmissing-prototypes] 33 | unsigned int x86_stepping(unsigned int sig) | ^~~~~~~~~~~~
Add the proper include file so the prototypes are there.
Signed-off-by: Valdis Kletnieks valdis.kletnieks@vt.edu Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/42513.1565234837@turing-police Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/lib/cpu.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/lib/cpu.c b/arch/x86/lib/cpu.c index 2dd1fe13a37b3..19f707992db22 100644 --- a/arch/x86/lib/cpu.c +++ b/arch/x86/lib/cpu.c @@ -1,5 +1,6 @@ #include <linux/types.h> #include <linux/export.h> +#include <asm/cpu.h>
unsigned int x86_family(unsigned int sig) {
[ Upstream commit 6b7c3b86f0b63134b2ab56508921a0853ffa687a ]
Currently when too many retries have occurred there is a memory leak on the allocation for reply on the error return path. Fix this by kfree'ing reply before returning.
Addresses-Coverity: ("Resource leak") Fixes: a9cd9c044aa9 ("drm/vmwgfx: Add a check to handle host message failure") Signed-off-by: Colin Ian King colin.king@canonical.com Reviewed-by: Deepak Rawat drawat@vmware.com Signed-off-by: Deepak Rawat drawat@vmware.com Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c index 97000996b8dc5..50cc060cc552a 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c @@ -300,8 +300,10 @@ static int vmw_recv_msg(struct rpc_channel *channel, void **msg, break; }
- if (retries == RETRIES) + if (retries == RETRIES) { + kfree(reply); return -EINVAL; + }
*msg_len = reply_len; *msg = reply;
[ Upstream commit cf30ae726c011e0372fd4c2d588466c8b50a8907 ]
The buffer containing the string used to set cpumask is overwritten at the end of the string later in cpu_map__snprint_mask due to not enough memory space, when there is only one cpu.
And thus causes the following failure:
$ perf ftrace ls failed to reset ftrace $
This patch fixes the calculation of the cpumask string size.
Signed-off-by: He Zhe zhe.he@windriver.com Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Alexey Budankov alexey.budankov@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Kan Liang kan.liang@linux.intel.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Fixes: dc23103278c5 ("perf ftrace: Add support for -a and -C option") Link: http://lkml.kernel.org/r/1564734592-15624-1-git-send-email-zhe.he@windriver.... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c index 25a42acabee18..13a33fb71a6da 100644 --- a/tools/perf/builtin-ftrace.c +++ b/tools/perf/builtin-ftrace.c @@ -162,7 +162,7 @@ static int set_tracing_cpumask(struct cpu_map *cpumap) int last_cpu;
last_cpu = cpu_map__cpu(cpumap, cpumap->nr - 1); - mask_size = (last_cpu + 3) / 4 + 1; + mask_size = last_cpu / 4 + 2; /* one more byte for EOS */ mask_size += last_cpu / 32; /* ',' is needed for every 32th cpus */
cpumask = malloc(mask_size);
[ Upstream commit 5f5e25f1c7933a6e1673515c0b1d5acd82fea1ed ]
cpu_map__snprint_mask() would write to illegal memory pointed by zalloc(0) when there is only one cpu.
This patch fixes the calculation and adds sanity check against the input parameters.
Signed-off-by: He Zhe zhe.he@windriver.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Alexey Budankov alexey.budankov@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Kan Liang kan.liang@linux.intel.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Fixes: 4400ac8a9a90 ("perf cpumap: Introduce cpu_map__snprint_mask()") Link: http://lkml.kernel.org/r/1564734592-15624-2-git-send-email-zhe.he@windriver.... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/cpumap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c index 383674f448fcd..f93846edc1e0d 100644 --- a/tools/perf/util/cpumap.c +++ b/tools/perf/util/cpumap.c @@ -701,7 +701,10 @@ size_t cpu_map__snprint_mask(struct cpu_map *map, char *buf, size_t size) unsigned char *bitmap; int last_cpu = cpu_map__cpu(map, map->nr - 1);
- bitmap = zalloc((last_cpu + 7) / 8); + if (buf == NULL) + return 0; + + bitmap = zalloc(last_cpu / 8 + 1); if (bitmap == NULL) { buf[0] = '\0'; return 0;
[ Upstream commit 8e6e5bea2e34c61291d00cb3f47560341aa84bc3 ]
The events defined in pmu-events JSON are parsed and added into perf tool. For fixed counters, we handle the encodings between JSON and perf by using a static array fixed[].
But the fixed[] has missed an important event "cpu_clk_unhalted.core".
For example, on the Tremont platform,
[root@localhost ~]# perf stat -e cpu_clk_unhalted.core -a event syntax error: 'cpu_clk_unhalted.core' ___ parser error
With this patch, the event cpu_clk_unhalted.core can be parsed.
[root@localhost perf]# ./perf stat -e cpu_clk_unhalted.core -a -vvv ------------------------------------------------------------ perf_event_attr: type 4 size 112 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ ...
Signed-off-by: Jin Yao yao.jin@linux.intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Jin Yao yao.jin@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Kan Liang kan.liang@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Link: http://lkml.kernel.org/r/20190729072755.2166-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/pmu-events/jevents.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index d51dc9ca8861a..94a7cabe9b824 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -346,6 +346,7 @@ static struct fixed { { "inst_retired.any_p", "event=0xc0" }, { "cpu_clk_unhalted.ref", "event=0x0,umask=0x03" }, { "cpu_clk_unhalted.thread", "event=0x3c" }, + { "cpu_clk_unhalted.core", "event=0x3c" }, { "cpu_clk_unhalted.thread_any", "event=0x3c,any=1" }, { NULL, NULL}, };
[ Upstream commit c096397c78f766db972f923433031f2dec01cae0 ]
selftests kvm test cases need pre-required kernel configs for the test to get pass.
Signed-off-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/config | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 tools/testing/selftests/kvm/config
diff --git a/tools/testing/selftests/kvm/config b/tools/testing/selftests/kvm/config new file mode 100644 index 0000000000000..63ed533f73d6e --- /dev/null +++ b/tools/testing/selftests/kvm/config @@ -0,0 +1,3 @@ +CONFIG_KVM=y +CONFIG_KVM_INTEL=y +CONFIG_KVM_AMD=y
From: Aaron Armstrong Skomra skomra@gmail.com
commit fcf887e7caaa813eea821d11bf2b7619a37df37a upstream.
The EKR ring claims a range of 0 to 71 but actually reports values 1 to 72. The ring is used in relative mode so this change should not affect users.
Signed-off-by: Aaron Armstrong Skomra aaron.skomra@wacom.com Fixes: 72b236d60218f ("HID: wacom: Add support for Express Key Remote.") Cc: stable@vger.kernel.org # v4.3+ Reviewed-by: Ping Cheng ping.cheng@wacom.com Reviewed-by: Jason Gerecke jason.gerecke@wacom.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/wacom_wac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -1061,7 +1061,7 @@ static int wacom_remote_irq(struct wacom input_report_key(input, BTN_BASE2, (data[11] & 0x02));
if (data[12] & 0x80) - input_report_abs(input, ABS_WHEEL, (data[12] & 0x7f)); + input_report_abs(input, ABS_WHEEL, (data[12] & 0x7f) - 1); else input_report_abs(input, ABS_WHEEL, 0);
From: Jason Gerecke jason.gerecke@wacom.com
commit b72fb1dcd2ea9d29417711cb302cef3006fa8d5a upstream.
Distance values reported by 2nd-gen Intuos tablets are on an inverted scale (0 == far, 63 == near). We need to change them over to a normal scale before reporting to userspace or else userspace drivers and applications can get confused.
Ref: https://github.com/linuxwacom/input-wacom/issues/98 Fixes: eda01dab53 ("HID: wacom: Add four new Intuos devices") Signed-off-by: Jason Gerecke jason.gerecke@wacom.com Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/wacom_wac.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -848,6 +848,8 @@ static int wacom_intuos_general(struct w y >>= 1; distance >>= 1; } + if (features->type == INTUOSHT2) + distance = features->distance_max - distance; input_report_abs(input, ABS_X, x); input_report_abs(input, ABS_Y, y); input_report_abs(input, ABS_DISTANCE, distance);
From: Mikulas Patocka mpatocka@redhat.com
commit cf3591ef832915892f2499b7e54b51d4c578b28c upstream.
Revert the commit bd293d071ffe65e645b4d8104f9d8fe15ea13862. The proper fix has been made available with commit d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread").
Note that the fix offered by commit bd293d071ffe doesn't really prevent the deadlock from occuring - if we look at the stacktrace reported by Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex - i.e. it has already successfully taken the mutex. Changing the mutex from mutex_lock to mutex_trylock won't help with deadlocks that happen afterwards.
PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Fixes: bd293d071ffe ("dm bufio: fix deadlock with loop device") Depends-on: d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread") Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-bufio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -1630,7 +1630,9 @@ dm_bufio_shrink_scan(struct shrinker *sh unsigned long freed;
c = container_of(shrink, struct dm_bufio_client, shrinker); - if (!dm_bufio_trylock(c)) + if (sc->gfp_mask & __GFP_FS) + dm_bufio_lock(c); + else if (!dm_bufio_trylock(c)) return SHRINK_STOP;
freed = __scan(c, sc->nr_to_scan, sc->gfp_mask);
From: Jeff Layton jlayton@kernel.org
commit 28a282616f56990547b9dcd5c6fbd2001344664c upstream.
When ceph_mdsc_do_request returns an error, we can't assume that the filelock_reply pointer will be set. Only try to fetch fields out of the r_reply_info when it returns success.
Cc: stable@vger.kernel.org Reported-by: Hector Martin hector@marcansoft.com Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/locks.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/ceph/locks.c +++ b/fs/ceph/locks.c @@ -78,8 +78,7 @@ static int ceph_lock_message(u8 lock_typ req->r_wait_for_completion = ceph_lock_wait_for_completion;
err = ceph_mdsc_do_request(mdsc, inode, req); - - if (operation == CEPH_MDS_OP_GETFILELOCK) { + if (!err && operation == CEPH_MDS_OP_GETFILELOCK) { fl->fl_pid = -le64_to_cpu(req->r_reply_info.filelock_reply->pid); if (CEPH_LOCK_SHARED == req->r_reply_info.filelock_reply->type) fl->fl_type = F_RDLCK;
From: Ilya Dryomov idryomov@gmail.com
commit a561372405cf6bc6f14239b3a9e57bb39f2788b0 upstream.
We can't rely on ->peer_features in calc_target() because it may be called both when the OSD session is established and open and when it's not. ->peer_features is not valid unless the OSD session is open. If this happens on a PG split (pg_num increase), that could mean we don't resend a request that should have been resent, hanging the client indefinitely.
In userspace this was fixed by looking at require_osd_release and get_xinfo[osd].features fields of the osdmap. However these fields belong to the OSD section of the osdmap, which the kernel doesn't decode (only the client section is decoded).
Instead, let's drop this feature check. It effectively checks for luminous, so only pre-luminous OSDs would be affected in that on a PG split the kernel might resend a request that should not have been resent. Duplicates can occur in other scenarios, so both sides should already be prepared for them: see dup/replay logic on the OSD side and retry_attempt check on the client side.
Cc: stable@vger.kernel.org Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT") Link: https://tracker.ceph.com/issues/41162 Reported-by: Jerry Lee leisurelysw24@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Tested-by: Jerry Lee leisurelysw24@gmail.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ceph/osd_client.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1330,7 +1330,7 @@ static enum calc_target_result calc_targ struct ceph_osds up, acting; bool force_resend = false; bool unpaused = false; - bool legacy_change; + bool legacy_change = false; bool split = false; bool sort_bitwise = ceph_osdmap_flag(osdc, CEPH_OSDMAP_SORTBITWISE); bool recovery_deletes = ceph_osdmap_flag(osdc, @@ -1426,15 +1426,14 @@ static enum calc_target_result calc_targ t->osd = acting.primary; }
- if (unpaused || legacy_change || force_resend || - (split && con && CEPH_HAVE_FEATURE(con->peer_features, - RESEND_ON_SPLIT))) + if (unpaused || legacy_change || force_resend || split) ct_res = CALC_TARGET_NEED_RESEND; else ct_res = CALC_TARGET_NO_ACTION;
out: - dout("%s t %p -> ct_res %d osd %d\n", __func__, t, ct_res, t->osd); + dout("%s t %p -> %d%d%d%d ct_res %d osd%d\n", __func__, t, unpaused, + legacy_change, force_resend, split, ct_res, t->osd); return ct_res; }
From: Lyude Paul lyude@redhat.com
commit c358ebf59634f06d8ed176da651ec150df3c8686 upstream.
While I had thought I had fixed this issue in:
commit 342406e4fbba ("drm/nouveau/i2c: Disable i2c bus access after ->fini()")
It turns out that while I did fix the error messages I was seeing on my P50 when trying to access i2c busses with the GPU in runtime suspend, I accidentally had missed one important detail that was mentioned on the bug report this commit was supposed to fix: that the CPU would only lock up when trying to access i2c busses _on connected devices_ _while the GPU is not in runtime suspend_. Whoops. That definitely explains why I was not able to get my machine to hang with i2c bus interactions until now, as plugging my P50 into it's dock with an HDMI monitor connected allowed me to finally reproduce this locally.
Now that I have managed to reproduce this issue properly, it looks like the problem is much simpler then it looks. It turns out that some connected devices, such as MST laptop docks, will actually ACK i2c reads even if no data was actually read:
[ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1 [ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000 [ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001 [ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000 [ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000 [ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
Because we don't handle the situation of i2c ack without any data, we end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the value of cnt always remains at 0. This finally properly explains how this could result in a CPU hang like the ones observed in the aforementioned commit.
So, fix this by retrying transactions if no data is written or received, and give up and fail the transaction if we continue to not write or receive any data after 32 retries.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs bskeggs@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-)
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c @@ -40,8 +40,7 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter u8 *ptr = msg->buf;
while (remaining) { - u8 cnt = (remaining > 16) ? 16 : remaining; - u8 cmd; + u8 cnt, retries, cmd;
if (msg->flags & I2C_M_RD) cmd = 1; @@ -51,10 +50,19 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter if (mcnt || remaining > 16) cmd |= 4; /* MOT */
- ret = aux->func->xfer(aux, true, cmd, msg->addr, ptr, &cnt); - if (ret < 0) { - nvkm_i2c_aux_release(aux); - return ret; + for (retries = 0, cnt = 0; + retries < 32 && !cnt; + retries++) { + cnt = min_t(u8, remaining, 16); + ret = aux->func->xfer(aux, true, cmd, + msg->addr, ptr, &cnt); + if (ret < 0) + goto out; + } + if (!cnt) { + AUX_TRACE(aux, "no data after 32 retries"); + ret = -EIO; + goto out; }
ptr += cnt; @@ -64,8 +72,10 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter msg++; }
+ ret = num; +out: nvkm_i2c_aux_release(aux); - return num; + return ret; }
static u32
From: Bartosz Golaszewski bgolaszewski@baylibre.com
commit 2c60e6b5c9241b24b8b523fefd3e44fb85622cda upstream.
If the driver doesn't support open-drain/source config options, we emulate this behavior when setting the direction by calling gpiod_direction_input() if the default value is 0 (open-source) or 1 (open-drain), thus not actively driving the line in those cases.
This however clears the FLAG_IS_OUT bit for the GPIO line descriptor and makes the LINEINFO ioctl() incorrectly report this line's mode as 'input' to user-space.
This commit modifies the ioctl() to always set the GPIOLINE_FLAG_IS_OUT bit in the lineinfo structure's flags field. Since it's impossible to use the input mode and open-drain/source options at the same time, we can be sure the reported information will be correct.
Fixes: 521a2ad6f862 ("gpio: add userspace ABI for GPIO line information") Cc: stable stable@vger.kernel.org Signed-off-by: Bartosz Golaszewski bgolaszewski@baylibre.com Link: https://lore.kernel.org/r/20190806114151.17652-1-brgl@bgdev.pl Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpiolib.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -971,9 +971,11 @@ static long gpio_ioctl(struct file *filp if (test_bit(FLAG_ACTIVE_LOW, &desc->flags)) lineinfo.flags |= GPIOLINE_FLAG_ACTIVE_LOW; if (test_bit(FLAG_OPEN_DRAIN, &desc->flags)) - lineinfo.flags |= GPIOLINE_FLAG_OPEN_DRAIN; + lineinfo.flags |= (GPIOLINE_FLAG_OPEN_DRAIN | + GPIOLINE_FLAG_IS_OUT); if (test_bit(FLAG_OPEN_SOURCE, &desc->flags)) - lineinfo.flags |= GPIOLINE_FLAG_OPEN_SOURCE; + lineinfo.flags |= (GPIOLINE_FLAG_OPEN_SOURCE | + GPIOLINE_FLAG_IS_OUT);
if (copy_to_user(ip, &lineinfo, sizeof(lineinfo))) return -EFAULT;
From: Oleg Nesterov oleg@redhat.com
commit 46d0b24c5ee10a15dfb25e20642f5a5ed59c5003 upstream.
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if mm->core_state != NULL.
Otherwise a page fault can see userfaultfd_missing() == T and use an already freed userfaultfd_ctx.
Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping") Signed-off-by: Oleg Nesterov oleg@redhat.com Reported-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Andrea Arcangeli aarcange@redhat.com Tested-by: Kefeng Wang wangkefeng.wang@huawei.com Cc: Peter Xu peterx@redhat.com Cc: Mike Rapoport rppt@linux.ibm.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@mellanox.com Cc: Michal Hocko mhocko@suse.com Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/userfaultfd.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
--- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -854,6 +854,7 @@ static int userfaultfd_release(struct in /* len == 0 means wake all */ struct userfaultfd_wake_range range = { .len = 0, }; unsigned long new_flags; + bool still_valid;
ACCESS_ONCE(ctx->released) = true;
@@ -869,8 +870,7 @@ static int userfaultfd_release(struct in * taking the mmap_sem for writing. */ down_write(&mm->mmap_sem); - if (!mmget_still_valid(mm)) - goto skip_mm; + still_valid = mmget_still_valid(mm); prev = NULL; for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); @@ -881,19 +881,20 @@ static int userfaultfd_release(struct in continue; } new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP); - prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end, - new_flags, vma->anon_vma, - vma->vm_file, vma->vm_pgoff, - vma_policy(vma), - NULL_VM_UFFD_CTX); - if (prev) - vma = prev; - else - prev = vma; + if (still_valid) { + prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end, + new_flags, vma->anon_vma, + vma->vm_file, vma->vm_pgoff, + vma_policy(vma), + NULL_VM_UFFD_CTX); + if (prev) + vma = prev; + else + prev = vma; + } vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; } -skip_mm: up_write(&mm->mmap_sem); mmput(mm); wakeup:
From: Sean Christopherson sean.j.christopherson@intel.com
commit b63f20a778c88b6a04458ed6ffc69da953d3a109 upstream.
Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to avoid clobbering flags.
KVM's emulator makes indirect calls into a jump table of sorts, where the destination of the CALL_NOSPEC is a small blob of code that performs fast emulation by executing the target instruction with fixed operands.
adcb_al_dl: 0x000339f8 <+0>: adc %dl,%al 0x000339fa <+2>: ret
A major motiviation for doing fast emulation is to leverage the CPU to handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is both an input and output to the target of CALL_NOSPEC. Clobbering flags results in all sorts of incorrect emulation, e.g. Jcc instructions often take the wrong path. Sans the nops...
asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n" 0x0003595a <+58>: mov 0xc0(%ebx),%eax 0x00035960 <+64>: mov 0x60(%ebx),%edx 0x00035963 <+67>: mov 0x90(%ebx),%ecx 0x00035969 <+73>: push %edi 0x0003596a <+74>: popf 0x0003596b <+75>: call *%esi 0x000359a0 <+128>: pushf 0x000359a1 <+129>: pop %edi 0x000359a2 <+130>: mov %eax,0xc0(%ebx) 0x000359b1 <+145>: mov %edx,0x60(%ebx)
ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK); 0x000359a8 <+136>: mov -0x10(%ebp),%eax 0x000359ab <+139>: and $0x8d5,%edi 0x000359b4 <+148>: and $0xfffff72a,%eax 0x000359b9 <+153>: or %eax,%edi 0x000359bd <+157>: mov %edi,0x4(%ebx)
For the most part this has gone unnoticed as emulation of guest code that can trigger fast emulation is effectively limited to MMIO when running on modern hardware, and MMIO is rarely, if ever, accessed by instructions that affect or consume flags.
Breakage is almost instantaneous when running with unrestricted guest disabled, in which case KVM must emulate all instructions when the guest has invalid state, e.g. when the guest is in Big Real Mode during early BIOS.
Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support") Fixes: 1a29b5b7f347a ("KVM: x86: Make indirect calls in emulator speculation safe") Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/nospec-branch.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -202,7 +202,7 @@ " lfence;\n" \ " jmp 902b;\n" \ " .align 16\n" \ - "903: addl $4, %%esp;\n" \ + "903: lea 4(%%esp), %%esp;\n" \ " pushl %[thunk_target];\n" \ " ret;\n" \ " .align 16\n" \
From: Thomas Gleixner tglx@linutronix.de
commit f897e60a12f0b9146357780d317879bce2a877dc upstream.
Some newer machines do not advertise legacy timers. The kernel can handle that situation if the TSC and the CPU frequency are enumerated by CPUID or MSRs and the CPU supports TSC deadline timer. If the CPU does not support TSC deadline timer the local APIC timer frequency has to be known as well.
Some Ryzens machines do not advertize legacy timers, but there is no reliable way to determine the bus frequency which feeds the local APIC timer when the machine allows overclocking of that frequency.
As there is no legacy timer the local APIC timer calibration crashes due to a NULL pointer dereference when accessing the not installed global clock event device.
Switch the calibration loop to a non interrupt based one, which polls either TSC (if frequency is known) or jiffies. The latter requires a global clockevent. As the machines which do not have a global clockevent installed have a known TSC frequency this is a non issue. For older machines where TSC frequency is not known, there is no known case where the legacy timers do not exist as that would have been reported long ago.
Reported-by: Daniel Drake drake@endlessm.com Reported-by: Jiri Slaby jslaby@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Daniel Drake drake@endlessm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linu... Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/apic/apic.c | 68 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 53 insertions(+), 15 deletions(-)
--- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -723,7 +723,7 @@ static __initdata unsigned long lapic_ca static __initdata unsigned long lapic_cal_j1, lapic_cal_j2;
/* - * Temporary interrupt handler. + * Temporary interrupt handler and polled calibration function. */ static void __init lapic_cal_handler(struct clock_event_device *dev) { @@ -807,7 +807,8 @@ calibrate_by_pmtimer(long deltapm, long static int __init calibrate_APIC_clock(void) { struct clock_event_device *levt = this_cpu_ptr(&lapic_events); - void (*real_handler)(struct clock_event_device *dev); + u64 tsc_perj = 0, tsc_start = 0; + unsigned long jif_start; unsigned long deltaj; long delta, deltatsc; int pm_referenced = 0; @@ -838,29 +839,65 @@ static int __init calibrate_APIC_clock(v apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n" "calibrating APIC timer ...\n");
+ /* + * There are platforms w/o global clockevent devices. Instead of + * making the calibration conditional on that, use a polling based + * approach everywhere. + */ local_irq_disable();
- /* Replace the global interrupt handler */ - real_handler = global_clock_event->event_handler; - global_clock_event->event_handler = lapic_cal_handler; - /* * Setup the APIC counter to maximum. There is no way the lapic * can underflow in the 100ms detection time frame */ __setup_APIC_LVTT(0xffffffff, 0, 0);
- /* Let the interrupts run */ + /* + * Methods to terminate the calibration loop: + * 1) Global clockevent if available (jiffies) + * 2) TSC if available and frequency is known + */ + jif_start = READ_ONCE(jiffies); + + if (tsc_khz) { + tsc_start = rdtsc(); + tsc_perj = div_u64((u64)tsc_khz * 1000, HZ); + } + + /* + * Enable interrupts so the tick can fire, if a global + * clockevent device is available + */ local_irq_enable();
- while (lapic_cal_loops <= LAPIC_CAL_LOOPS) - cpu_relax(); + while (lapic_cal_loops <= LAPIC_CAL_LOOPS) { + /* Wait for a tick to elapse */ + while (1) { + if (tsc_khz) { + u64 tsc_now = rdtsc(); + if ((tsc_now - tsc_start) >= tsc_perj) { + tsc_start += tsc_perj; + break; + } + } else { + unsigned long jif_now = READ_ONCE(jiffies); + + if (time_after(jif_now, jif_start)) { + jif_start = jif_now; + break; + } + } + cpu_relax(); + } + + /* Invoke the calibration routine */ + local_irq_disable(); + lapic_cal_handler(NULL); + local_irq_enable(); + }
local_irq_disable();
- /* Restore the real event handler */ - global_clock_event->event_handler = real_handler; - /* Build delta t1-t2 as apic timer counts down */ delta = lapic_cal_t1 - lapic_cal_t2; apic_printk(APIC_VERBOSE, "... lapic delta = %ld\n", delta); @@ -912,10 +949,11 @@ static int __init calibrate_APIC_clock(v levt->features &= ~CLOCK_EVT_FEAT_DUMMY;
/* - * PM timer calibration failed or not turned on - * so lets try APIC timer based calibration + * PM timer calibration failed or not turned on so lets try APIC + * timer based calibration, if a global clockevent device is + * available. */ - if (!pm_referenced) { + if (!pm_referenced && global_clock_event) { apic_printk(APIC_VERBOSE, "... verify APIC timer\n");
/*
From: Tom Lendacky thomas.lendacky@amd.com
commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24 upstream.
There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. This issue stems from a BIOS not performing the proper steps during resume to ensure RDRAND continues to function properly.
RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported.
Update the CPU initialization to clear the RDRAND CPUID bit for any family 15h and 16h processor that supports RDRAND. If it is known that the family 15h or family 16h system does not have an RDRAND resume issue or that the system will not be placed in suspend, the "rdrand=force" kernel parameter can be used to stop the clearing of the RDRAND CPUID bit.
Additionally, update the suspend and resume path to save and restore the MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in place after resuming from suspend.
Note, that clearing the RDRAND CPUID bit does not prevent a processor that normally supports the RDRAND instruction from executing it. So any code that determined the support based on family and model won't #UD.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Borislav Petkov bp@suse.de Cc: Andrew Cooper andrew.cooper3@citrix.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Chen Yu yu.c.chen@intel.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Juergen Gross jgross@suse.com Cc: Kees Cook keescook@chromium.org Cc: "linux-doc@vger.kernel.org" linux-doc@vger.kernel.org Cc: "linux-pm@vger.kernel.org" linux-pm@vger.kernel.org Cc: Nathan Chancellor natechancellor@gmail.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Pavel Machek pavel@ucw.cz Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: stable@vger.kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: "x86@kernel.org" x86@kernel.org Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.156622994... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/admin-guide/kernel-parameters.txt | 7 + arch/x86/include/asm/msr-index.h | 1 arch/x86/kernel/cpu/amd.c | 66 ++++++++++++++++++ arch/x86/power/cpu.c | 86 ++++++++++++++++++++---- 4 files changed, 147 insertions(+), 13 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3788,6 +3788,13 @@ Run specified binary instead of /init from the ramdisk, used for early userspace startup. See initrd.
+ rdrand= [X86] + force - Override the decision by the kernel to hide the + advertisement of RDRAND support (this affects + certain AMD processors because of buggy BIOS + support, specifically around the suspend/resume + path). + rdt= [HW,X86,RDT] Turn on/off individual RDT features. List is: cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, mba. --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -334,6 +334,7 @@ #define MSR_AMD64_PATCH_LEVEL 0x0000008b #define MSR_AMD64_TSC_RATIO 0xc0000104 #define MSR_AMD64_NB_CFG 0xc001001f +#define MSR_AMD64_CPUID_FN_1 0xc0011004 #define MSR_AMD64_PATCH_LOADER 0xc0010020 #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140 #define MSR_AMD64_OSVW_STATUS 0xc0010141 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -772,6 +772,64 @@ static void init_amd_ln(struct cpuinfo_x msr_set_bit(MSR_AMD64_DE_CFG, 31); }
+static bool rdrand_force; + +static int __init rdrand_cmdline(char *str) +{ + if (!str) + return -EINVAL; + + if (!strcmp(str, "force")) + rdrand_force = true; + else + return -EINVAL; + + return 0; +} +early_param("rdrand", rdrand_cmdline); + +static void clear_rdrand_cpuid_bit(struct cpuinfo_x86 *c) +{ + /* + * Saving of the MSR used to hide the RDRAND support during + * suspend/resume is done by arch/x86/power/cpu.c, which is + * dependent on CONFIG_PM_SLEEP. + */ + if (!IS_ENABLED(CONFIG_PM_SLEEP)) + return; + + /* + * The nordrand option can clear X86_FEATURE_RDRAND, so check for + * RDRAND support using the CPUID function directly. + */ + if (!(cpuid_ecx(1) & BIT(30)) || rdrand_force) + return; + + msr_clear_bit(MSR_AMD64_CPUID_FN_1, 62); + + /* + * Verify that the CPUID change has occurred in case the kernel is + * running virtualized and the hypervisor doesn't support the MSR. + */ + if (cpuid_ecx(1) & BIT(30)) { + pr_info_once("BIOS may not properly restore RDRAND after suspend, but hypervisor does not support hiding RDRAND via CPUID.\n"); + return; + } + + clear_cpu_cap(c, X86_FEATURE_RDRAND); + pr_info_once("BIOS may not properly restore RDRAND after suspend, hiding RDRAND via CPUID. Use rdrand=force to reenable.\n"); +} + +static void init_amd_jg(struct cpuinfo_x86 *c) +{ + /* + * Some BIOS implementations do not restore proper RDRAND support + * across suspend and resume. Check on whether to hide the RDRAND + * instruction support via CPUID. + */ + clear_rdrand_cpuid_bit(c); +} + static void init_amd_bd(struct cpuinfo_x86 *c) { u64 value; @@ -786,6 +844,13 @@ static void init_amd_bd(struct cpuinfo_x wrmsrl_safe(MSR_F15H_IC_CFG, value); } } + + /* + * Some BIOS implementations do not restore proper RDRAND support + * across suspend and resume. Check on whether to hide the RDRAND + * instruction support via CPUID. + */ + clear_rdrand_cpuid_bit(c); }
static void init_amd_zn(struct cpuinfo_x86 *c) @@ -828,6 +893,7 @@ static void init_amd(struct cpuinfo_x86 case 0x10: init_amd_gh(c); break; case 0x12: init_amd_ln(c); break; case 0x15: init_amd_bd(c); break; + case 0x16: init_amd_jg(c); break; case 0x17: init_amd_zn(c); break; }
--- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -13,6 +13,7 @@ #include <linux/smp.h> #include <linux/perf_event.h> #include <linux/tboot.h> +#include <linux/dmi.h>
#include <asm/pgtable.h> #include <asm/proto.h> @@ -24,7 +25,7 @@ #include <asm/debugreg.h> #include <asm/cpu.h> #include <asm/mmu_context.h> -#include <linux/dmi.h> +#include <asm/cpu_device_id.h>
#ifdef CONFIG_X86_32 __visible unsigned long saved_context_ebx; @@ -398,15 +399,14 @@ static int __init bsp_pm_check_init(void
core_initcall(bsp_pm_check_init);
-static int msr_init_context(const u32 *msr_id, const int total_num) +static int msr_build_context(const u32 *msr_id, const int num) { - int i = 0; + struct saved_msrs *saved_msrs = &saved_context.saved_msrs; struct saved_msr *msr_array; + int total_num; + int i, j;
- if (saved_context.saved_msrs.array || saved_context.saved_msrs.num > 0) { - pr_err("x86/pm: MSR quirk already applied, please check your DMI match table.\n"); - return -EINVAL; - } + total_num = saved_msrs->num + num;
msr_array = kmalloc_array(total_num, sizeof(struct saved_msr), GFP_KERNEL); if (!msr_array) { @@ -414,19 +414,30 @@ static int msr_init_context(const u32 *m return -ENOMEM; }
- for (i = 0; i < total_num; i++) { - msr_array[i].info.msr_no = msr_id[i]; + if (saved_msrs->array) { + /* + * Multiple callbacks can invoke this function, so copy any + * MSR save requests from previous invocations. + */ + memcpy(msr_array, saved_msrs->array, + sizeof(struct saved_msr) * saved_msrs->num); + + kfree(saved_msrs->array); + } + + for (i = saved_msrs->num, j = 0; i < total_num; i++, j++) { + msr_array[i].info.msr_no = msr_id[j]; msr_array[i].valid = false; msr_array[i].info.reg.q = 0; } - saved_context.saved_msrs.num = total_num; - saved_context.saved_msrs.array = msr_array; + saved_msrs->num = total_num; + saved_msrs->array = msr_array;
return 0; }
/* - * The following section is a quirk framework for problematic BIOSen: + * The following sections are a quirk framework for problematic BIOSen: * Sometimes MSRs are modified by the BIOSen after suspended to * RAM, this might cause unexpected behavior after wakeup. * Thus we save/restore these specified MSRs across suspend/resume @@ -441,7 +452,7 @@ static int msr_initialize_bdw(const stru u32 bdw_msr_id[] = { MSR_IA32_THERM_CONTROL };
pr_info("x86/pm: %s detected, MSR saving is needed during suspending.\n", d->ident); - return msr_init_context(bdw_msr_id, ARRAY_SIZE(bdw_msr_id)); + return msr_build_context(bdw_msr_id, ARRAY_SIZE(bdw_msr_id)); }
static const struct dmi_system_id msr_save_dmi_table[] = { @@ -456,9 +467,58 @@ static const struct dmi_system_id msr_sa {} };
+static int msr_save_cpuid_features(const struct x86_cpu_id *c) +{ + u32 cpuid_msr_id[] = { + MSR_AMD64_CPUID_FN_1, + }; + + pr_info("x86/pm: family %#hx cpu detected, MSR saving is needed during suspending.\n", + c->family); + + return msr_build_context(cpuid_msr_id, ARRAY_SIZE(cpuid_msr_id)); +} + +static const struct x86_cpu_id msr_save_cpu_table[] = { + { + .vendor = X86_VENDOR_AMD, + .family = 0x15, + .model = X86_MODEL_ANY, + .feature = X86_FEATURE_ANY, + .driver_data = (kernel_ulong_t)msr_save_cpuid_features, + }, + { + .vendor = X86_VENDOR_AMD, + .family = 0x16, + .model = X86_MODEL_ANY, + .feature = X86_FEATURE_ANY, + .driver_data = (kernel_ulong_t)msr_save_cpuid_features, + }, + {} +}; + +typedef int (*pm_cpu_match_t)(const struct x86_cpu_id *); +static int pm_cpu_check(const struct x86_cpu_id *c) +{ + const struct x86_cpu_id *m; + int ret = 0; + + m = x86_match_cpu(msr_save_cpu_table); + if (m) { + pm_cpu_match_t fn; + + fn = (pm_cpu_match_t)m->driver_data; + ret = fn(m); + } + + return ret; +} + static int pm_check_save_msr(void) { dmi_check_system(msr_save_dmi_table); + pm_cpu_check(msr_save_cpu_table); + return 0; }
From: John Hubbard jhubbard@nvidia.com
commit a90118c445cc7f07781de26a9684d4ec58bfcfd1 upstream.
Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds memset, if the memset goes accross several fields of a struct. This generated a couple of warnings on x86_64 builds in sanitize_boot_params().
Fix this by explicitly saving the fields in struct boot_params that are intended to be preserved, and zeroing all the rest.
[ tglx: Tagged for stable as it breaks the warning free build there as well ]
Suggested-by: Thomas Gleixner tglx@linutronix.de Suggested-by: H. Peter Anvin hpa@zytor.com Signed-off-by: John Hubbard jhubbard@nvidia.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bootparam_utils.h | 60 +++++++++++++++++++++++++-------- 1 file changed, 47 insertions(+), 13 deletions(-)
--- a/arch/x86/include/asm/bootparam_utils.h +++ b/arch/x86/include/asm/bootparam_utils.h @@ -18,6 +18,20 @@ * Note: efi_info is commonly left uninitialized, but that field has a * private magic, so it is better to leave it unchanged. */ + +#define sizeof_mbr(type, member) ({ sizeof(((type *)0)->member); }) + +#define BOOT_PARAM_PRESERVE(struct_member) \ + { \ + .start = offsetof(struct boot_params, struct_member), \ + .len = sizeof_mbr(struct boot_params, struct_member), \ + } + +struct boot_params_to_save { + unsigned int start; + unsigned int len; +}; + static void sanitize_boot_params(struct boot_params *boot_params) { /* @@ -36,19 +50,39 @@ static void sanitize_boot_params(struct */ if (boot_params->sentinel) { /* fields in boot_params are left uninitialized, clear them */ - memset(&boot_params->ext_ramdisk_image, 0, - (char *)&boot_params->efi_info - - (char *)&boot_params->ext_ramdisk_image); - memset(&boot_params->kbd_status, 0, - (char *)&boot_params->hdr - - (char *)&boot_params->kbd_status); - memset(&boot_params->_pad7[0], 0, - (char *)&boot_params->edd_mbr_sig_buffer[0] - - (char *)&boot_params->_pad7[0]); - memset(&boot_params->_pad8[0], 0, - (char *)&boot_params->eddbuf[0] - - (char *)&boot_params->_pad8[0]); - memset(&boot_params->_pad9[0], 0, sizeof(boot_params->_pad9)); + static struct boot_params scratch; + char *bp_base = (char *)boot_params; + char *save_base = (char *)&scratch; + int i; + + const struct boot_params_to_save to_save[] = { + BOOT_PARAM_PRESERVE(screen_info), + BOOT_PARAM_PRESERVE(apm_bios_info), + BOOT_PARAM_PRESERVE(tboot_addr), + BOOT_PARAM_PRESERVE(ist_info), + BOOT_PARAM_PRESERVE(hd0_info), + BOOT_PARAM_PRESERVE(hd1_info), + BOOT_PARAM_PRESERVE(sys_desc_table), + BOOT_PARAM_PRESERVE(olpc_ofw_header), + BOOT_PARAM_PRESERVE(efi_info), + BOOT_PARAM_PRESERVE(alt_mem_k), + BOOT_PARAM_PRESERVE(scratch), + BOOT_PARAM_PRESERVE(e820_entries), + BOOT_PARAM_PRESERVE(eddbuf_entries), + BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries), + BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer), + BOOT_PARAM_PRESERVE(e820_table), + BOOT_PARAM_PRESERVE(eddbuf), + }; + + memset(&scratch, 0, sizeof(scratch)); + + for (i = 0; i < ARRAY_SIZE(to_save); i++) { + memcpy(save_base + to_save[i].start, + bp_base + to_save[i].start, to_save[i].len); + } + + memcpy(boot_params, save_base, sizeof(*boot_params)); } }
From: John Hubbard jhubbard@nvidia.com
commit 7846f58fba964af7cb8cf77d4d13c33254725211 upstream.
commit a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") had two errors:
* It preserved boot_params.acpi_rsdp_addr, and * It failed to preserve boot_params.hdr
Therefore, zero out acpi_rsdp_addr, and preserve hdr.
Fixes: a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else") Reported-by: Neil MacLeod neil@nmacleod.com Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: John Hubbard jhubbard@nvidia.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Neil MacLeod neil@nmacleod.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190821192513.20126-1-jhubbard@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bootparam_utils.h | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/include/asm/bootparam_utils.h +++ b/arch/x86/include/asm/bootparam_utils.h @@ -71,6 +71,7 @@ static void sanitize_boot_params(struct BOOT_PARAM_PRESERVE(eddbuf_entries), BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries), BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer), + BOOT_PARAM_PRESERVE(hdr), BOOT_PARAM_PRESERVE(e820_table), BOOT_PARAM_PRESERVE(eddbuf), };
From: Dmitry Fomichev dmitry.fomichev@wdc.com
commit d1fef41465f0e8cae0693fb184caa6bfafb6cd16 upstream.
This patch fixes a problem in dm-kcopyd that may leave jobs in complete queue indefinitely in the event of backing storage failure.
This behavior has been observed while running 100% write file fio workload against an XFS volume created on top of a dm-zoned target device. If the underlying storage of dm-zoned goes to offline state under I/O, kcopyd sometimes never issues the end copy callback and dm-zoned reclaim work hangs indefinitely waiting for that completion.
This behavior was traced down to the error handling code in process_jobs() function that places the failed job to complete_jobs queue, but doesn't wake up the job handler. In case of backing device failure, all outstanding jobs may end up going to complete_jobs queue via this code path and then stay there forever because there are no more successful I/O jobs to wake up the job handler.
This patch adds a wake() call to always wake up kcopyd job wait queue for all I/O jobs that fail before dm_io() gets called for that job.
The patch also sets the write error status in all sub jobs that are failed because their master job has failed.
Fixes: b73c67c2cbb00 ("dm kcopyd: add sequential write feature") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev dmitry.fomichev@wdc.com Reviewed-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-kcopyd.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/md/dm-kcopyd.c +++ b/drivers/md/dm-kcopyd.c @@ -545,8 +545,10 @@ static int run_io_job(struct kcopyd_job * no point in continuing. */ if (test_bit(DM_KCOPYD_WRITE_SEQ, &job->flags) && - job->master_job->write_err) + job->master_job->write_err) { + job->write_err = job->master_job->write_err; return -EIO; + }
io_job_start(job->kc->throttle);
@@ -598,6 +600,7 @@ static int process_jobs(struct list_head else job->read_err = 1; push(&kc->complete_jobs, job); + wake(kc); break; }
From: ZhangXiaoxu zhangxiaoxu5@huawei.com
commit e4f9d6013820d1eba1432d51dd1c5795759aa77f upstream.
When btree_split_beneath() splits a node to two new children, it will allocate two blocks: left and right. If right block's allocation failed, the left block will be unlocked and marked dirty. If this happened, the left block'ss content is zero, because it wasn't initialized with the btree struct before the attempot to allocate the right block. Upon return, when flushing the left block to disk, the validator will fail when check this block. Then a BUG_ON is raised.
Fix this by completely initializing the left block before allocating and initializing the right block.
Fixes: 4dcb8b57df359 ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path") Cc: stable@vger.kernel.org Signed-off-by: ZhangXiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/persistent-data/dm-btree.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-)
--- a/drivers/md/persistent-data/dm-btree.c +++ b/drivers/md/persistent-data/dm-btree.c @@ -628,39 +628,40 @@ static int btree_split_beneath(struct sh
new_parent = shadow_current(s);
+ pn = dm_block_data(new_parent); + size = le32_to_cpu(pn->header.flags) & INTERNAL_NODE ? + sizeof(__le64) : s->info->value_type.size; + + /* create & init the left block */ r = new_block(s->info, &left); if (r < 0) return r;
+ ln = dm_block_data(left); + nr_left = le32_to_cpu(pn->header.nr_entries) / 2; + + ln->header.flags = pn->header.flags; + ln->header.nr_entries = cpu_to_le32(nr_left); + ln->header.max_entries = pn->header.max_entries; + ln->header.value_size = pn->header.value_size; + memcpy(ln->keys, pn->keys, nr_left * sizeof(pn->keys[0])); + memcpy(value_ptr(ln, 0), value_ptr(pn, 0), nr_left * size); + + /* create & init the right block */ r = new_block(s->info, &right); if (r < 0) { unlock_block(s->info, left); return r; }
- pn = dm_block_data(new_parent); - ln = dm_block_data(left); rn = dm_block_data(right); - - nr_left = le32_to_cpu(pn->header.nr_entries) / 2; nr_right = le32_to_cpu(pn->header.nr_entries) - nr_left;
- ln->header.flags = pn->header.flags; - ln->header.nr_entries = cpu_to_le32(nr_left); - ln->header.max_entries = pn->header.max_entries; - ln->header.value_size = pn->header.value_size; - rn->header.flags = pn->header.flags; rn->header.nr_entries = cpu_to_le32(nr_right); rn->header.max_entries = pn->header.max_entries; rn->header.value_size = pn->header.value_size; - - memcpy(ln->keys, pn->keys, nr_left * sizeof(pn->keys[0])); memcpy(rn->keys, pn->keys + nr_left, nr_right * sizeof(pn->keys[0])); - - size = le32_to_cpu(pn->header.flags) & INTERNAL_NODE ? - sizeof(__le64) : s->info->value_type.size; - memcpy(value_ptr(ln, 0), value_ptr(pn, 0), nr_left * size); memcpy(value_ptr(rn, 0), value_ptr(pn, nr_left), nr_right * size);
From: ZhangXiaoxu zhangxiaoxu5@huawei.com
commit ae148243d3f0816b37477106c05a2ec7d5f32614 upstream.
In commit 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize"), we refactor the commit logic to a new function 'apply_bops'. But when that logic was replaced in out() the return value was not stored. This may lead out() returning a wrong value to the caller.
Fixes: 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize") Cc: stable@vger.kernel.org Signed-off-by: ZhangXiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/persistent-data/dm-space-map-metadata.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/persistent-data/dm-space-map-metadata.c +++ b/drivers/md/persistent-data/dm-space-map-metadata.c @@ -248,7 +248,7 @@ static int out(struct sm_metadata *smm) }
if (smm->recursion_count == 1) - apply_bops(smm); + r = apply_bops(smm);
smm->recursion_count--;
From: Mikulas Patocka mpatocka@redhat.com
commit 1cfd5d3399e87167b7f9157ef99daa0e959f395d upstream.
If the sector number is too high, dm_table_find_target() should return a pointer to a zeroed dm_target structure (the caller should test it with dm_target_is_valid).
However, for some table sizes, the code in dm_table_find_target() that performs btree lookup will access out of bound memory structures.
Fix this bug by testing the sector number at the beginning of dm_table_find_target(). Also, add an "inline" keyword to the function dm_table_get_size() because this is a hot path.
Fixes: 512875bd9661 ("dm: table detect io beyond device") Cc: stable@vger.kernel.org Reported-by: Zhang Tao kontais@zoho.com Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-table.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1308,7 +1308,7 @@ void dm_table_event(struct dm_table *t) } EXPORT_SYMBOL(dm_table_event);
-sector_t dm_table_get_size(struct dm_table *t) +inline sector_t dm_table_get_size(struct dm_table *t) { return t->num_targets ? (t->highs[t->num_targets - 1] + 1) : 0; } @@ -1333,6 +1333,9 @@ struct dm_target *dm_table_find_target(s unsigned int l, n = 0, k = 0; sector_t *node;
+ if (unlikely(sector >= dm_table_get_size(t))) + return &t->targets[t->num_targets]; + for (l = 0; l < t->depth; l++) { n = get_child(n, k); node = get_node(t, l, n);
From: Dmitry Fomichev dmitry.fomichev@wdc.com
commit b234c6d7a703661b5045c5bf569b7c99d2edbf88 upstream.
There are several places in reclaim code where errors are not propagated to the main function, dmz_reclaim(). This function is responsible for unlocking zones that might be still locked at the end of any failed reclaim iterations. As the result, some device zones may be left permanently locked for reclaim, degrading target's capability to reclaim zones.
This patch fixes these issues as follows -
Make sure that dmz_reclaim_buf(), dmz_reclaim_seq_data() and dmz_reclaim_rnd_data() return error codes to the caller.
dmz_reclaim() function is renamed to dmz_do_reclaim() to avoid clashing with "struct dmz_reclaim" and is modified to return the error to the caller.
dmz_get_zone_for_reclaim() now returns an error instead of NULL pointer and reclaim code checks for that error.
Error logging/debug messages are added where necessary.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev dmitry.fomichev@wdc.com Reviewed-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-zoned-metadata.c | 4 ++-- drivers/md/dm-zoned-reclaim.c | 28 +++++++++++++++++++--------- 2 files changed, 21 insertions(+), 11 deletions(-)
--- a/drivers/md/dm-zoned-metadata.c +++ b/drivers/md/dm-zoned-metadata.c @@ -1534,7 +1534,7 @@ static struct dm_zone *dmz_get_rnd_zone_ struct dm_zone *zone;
if (list_empty(&zmd->map_rnd_list)) - return NULL; + return ERR_PTR(-EBUSY);
list_for_each_entry(zone, &zmd->map_rnd_list, link) { if (dmz_is_buf(zone)) @@ -1545,7 +1545,7 @@ static struct dm_zone *dmz_get_rnd_zone_ return dzone; }
- return NULL; + return ERR_PTR(-EBUSY); }
/* --- a/drivers/md/dm-zoned-reclaim.c +++ b/drivers/md/dm-zoned-reclaim.c @@ -217,7 +217,7 @@ static int dmz_reclaim_buf(struct dmz_re
dmz_unlock_flush(zmd);
- return 0; + return ret; }
/* @@ -261,7 +261,7 @@ static int dmz_reclaim_seq_data(struct d
dmz_unlock_flush(zmd);
- return 0; + return ret; }
/* @@ -314,7 +314,7 @@ static int dmz_reclaim_rnd_data(struct d
dmz_unlock_flush(zmd);
- return 0; + return ret; }
/* @@ -336,7 +336,7 @@ static void dmz_reclaim_empty(struct dmz /* * Find a candidate zone for reclaim and process it. */ -static void dmz_reclaim(struct dmz_reclaim *zrc) +static int dmz_do_reclaim(struct dmz_reclaim *zrc) { struct dmz_metadata *zmd = zrc->metadata; struct dm_zone *dzone; @@ -346,8 +346,8 @@ static void dmz_reclaim(struct dmz_recla
/* Get a data zone */ dzone = dmz_get_zone_for_reclaim(zmd); - if (!dzone) - return; + if (IS_ERR(dzone)) + return PTR_ERR(dzone);
start = jiffies;
@@ -393,13 +393,20 @@ static void dmz_reclaim(struct dmz_recla out: if (ret) { dmz_unlock_zone_reclaim(dzone); - return; + return ret; }
- (void) dmz_flush_metadata(zrc->metadata); + ret = dmz_flush_metadata(zrc->metadata); + if (ret) { + dmz_dev_debug(zrc->dev, + "Metadata flush for zone %u failed, err %d\n", + dmz_id(zmd, rzone), ret); + return ret; + }
dmz_dev_debug(zrc->dev, "Reclaimed zone %u in %u ms", dmz_id(zmd, rzone), jiffies_to_msecs(jiffies - start)); + return 0; }
/* @@ -444,6 +451,7 @@ static void dmz_reclaim_work(struct work struct dmz_metadata *zmd = zrc->metadata; unsigned int nr_rnd, nr_unmap_rnd; unsigned int p_unmap_rnd; + int ret;
if (!dmz_should_reclaim(zrc)) { mod_delayed_work(zrc->wq, &zrc->work, DMZ_IDLE_PERIOD); @@ -473,7 +481,9 @@ static void dmz_reclaim_work(struct work (dmz_target_idle(zrc) ? "Idle" : "Busy"), p_unmap_rnd, nr_unmap_rnd, nr_rnd);
- dmz_reclaim(zrc); + ret = dmz_do_reclaim(zrc); + if (ret) + dmz_dev_debug(zrc->dev, "Reclaim error %d\n", ret);
dmz_schedule_reclaim(zrc); }
From: Dmitry Fomichev dmitry.fomichev@wdc.com
commit d7428c50118e739e672656c28d2b26b09375d4e0 upstream.
Some errors are ignored in the I/O path during queueing chunks for processing by chunk works. Since at least these errors are transient in nature, it should be possible to retry the failed incoming commands.
The fix -
Errors that can happen while queueing chunks are carried upwards to the main mapping function and it now returns DM_MAPIO_REQUEUE for any incoming requests that can not be properly queued.
Error logging/debug messages are added where needed.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev dmitry.fomichev@wdc.com Reviewed-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-zoned-target.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
--- a/drivers/md/dm-zoned-target.c +++ b/drivers/md/dm-zoned-target.c @@ -513,22 +513,24 @@ static void dmz_flush_work(struct work_s * Get a chunk work and start it to process a new BIO. * If the BIO chunk has no work yet, create one. */ -static void dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio) +static int dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio) { unsigned int chunk = dmz_bio_chunk(dmz->dev, bio); struct dm_chunk_work *cw; + int ret = 0;
mutex_lock(&dmz->chunk_lock);
/* Get the BIO chunk work. If one is not active yet, create one */ cw = radix_tree_lookup(&dmz->chunk_rxtree, chunk); if (!cw) { - int ret;
/* Create a new chunk work */ cw = kmalloc(sizeof(struct dm_chunk_work), GFP_NOIO); - if (!cw) + if (unlikely(!cw)) { + ret = -ENOMEM; goto out; + }
INIT_WORK(&cw->work, dmz_chunk_work); atomic_set(&cw->refcount, 0); @@ -539,7 +541,6 @@ static void dmz_queue_chunk_work(struct ret = radix_tree_insert(&dmz->chunk_rxtree, chunk, cw); if (unlikely(ret)) { kfree(cw); - cw = NULL; goto out; } } @@ -547,10 +548,12 @@ static void dmz_queue_chunk_work(struct bio_list_add(&cw->bio_list, bio); dmz_get_chunk_work(cw);
+ dmz_reclaim_bio_acc(dmz->reclaim); if (queue_work(dmz->chunk_wq, &cw->work)) dmz_get_chunk_work(cw); out: mutex_unlock(&dmz->chunk_lock); + return ret; }
/* @@ -564,6 +567,7 @@ static int dmz_map(struct dm_target *ti, sector_t sector = bio->bi_iter.bi_sector; unsigned int nr_sectors = bio_sectors(bio); sector_t chunk_sector; + int ret;
dmz_dev_debug(dev, "BIO op %d sector %llu + %u => chunk %llu, block %llu, %u blocks", bio_op(bio), (unsigned long long)sector, nr_sectors, @@ -601,8 +605,14 @@ static int dmz_map(struct dm_target *ti, dm_accept_partial_bio(bio, dev->zone_nr_sectors - chunk_sector);
/* Now ready to handle this BIO */ - dmz_reclaim_bio_acc(dmz->reclaim); - dmz_queue_chunk_work(dmz, bio); + ret = dmz_queue_chunk_work(dmz, bio); + if (ret) { + dmz_dev_debug(dmz->dev, + "BIO op %d, can't process chunk %llu, err %i\n", + bio_op(bio), (u64)dmz_bio_chunk(dmz->dev, bio), + ret); + return DM_MAPIO_REQUEUE; + }
return DM_MAPIO_SUBMITTED; }
From: Dmitry Fomichev dmitry.fomichev@wdc.com
commit 75d66ffb48efb30f2dd42f041ba8b39c5b2bd115 upstream.
dm-zoned is observed to lock up or livelock in case of hardware failure or some misconfiguration of the backing zoned device.
This patch adds a new dm-zoned target function that checks the status of the backing device. If the request queue of the backing device is found to be in dying state or the SCSI backing device enters offline state, the health check code sets a dm-zoned target flag prompting all further incoming I/O to be rejected. In order to detect backing device failures timely, this new function is called in the request mapping path, at the beginning of every reclaim run and before performing any metadata I/O.
The proper way out of this situation is to do
dmsetup remove <dm-zoned target>
and recreate the target when the problem with the backing device is resolved.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev dmitry.fomichev@wdc.com Reviewed-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-zoned-metadata.c | 51 ++++++++++++++++++++++++++++++++--------- drivers/md/dm-zoned-reclaim.c | 18 ++++++++++++-- drivers/md/dm-zoned-target.c | 45 ++++++++++++++++++++++++++++++++++-- drivers/md/dm-zoned.h | 10 ++++++++ 4 files changed, 110 insertions(+), 14 deletions(-)
--- a/drivers/md/dm-zoned-metadata.c +++ b/drivers/md/dm-zoned-metadata.c @@ -401,15 +401,18 @@ static struct dmz_mblock *dmz_get_mblock sector_t block = zmd->sb[zmd->mblk_primary].block + mblk_no; struct bio *bio;
+ if (dmz_bdev_is_dying(zmd->dev)) + return ERR_PTR(-EIO); + /* Get a new block and a BIO to read it */ mblk = dmz_alloc_mblock(zmd, mblk_no); if (!mblk) - return NULL; + return ERR_PTR(-ENOMEM);
bio = bio_alloc(GFP_NOIO, 1); if (!bio) { dmz_free_mblock(zmd, mblk); - return NULL; + return ERR_PTR(-ENOMEM); }
spin_lock(&zmd->mblk_lock); @@ -540,8 +543,8 @@ static struct dmz_mblock *dmz_get_mblock if (!mblk) { /* Cache miss: read the block from disk */ mblk = dmz_get_mblock_slow(zmd, mblk_no); - if (!mblk) - return ERR_PTR(-ENOMEM); + if (IS_ERR(mblk)) + return mblk; }
/* Wait for on-going read I/O and check for error */ @@ -569,16 +572,19 @@ static void dmz_dirty_mblock(struct dmz_ /* * Issue a metadata block write BIO. */ -static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, - unsigned int set) +static int dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, + unsigned int set) { sector_t block = zmd->sb[set].block + mblk->no; struct bio *bio;
+ if (dmz_bdev_is_dying(zmd->dev)) + return -EIO; + bio = bio_alloc(GFP_NOIO, 1); if (!bio) { set_bit(DMZ_META_ERROR, &mblk->state); - return; + return -ENOMEM; }
set_bit(DMZ_META_WRITING, &mblk->state); @@ -590,6 +596,8 @@ static void dmz_write_mblock(struct dmz_ bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0); submit_bio(bio); + + return 0; }
/* @@ -601,6 +609,9 @@ static int dmz_rdwr_block(struct dmz_met struct bio *bio; int ret;
+ if (dmz_bdev_is_dying(zmd->dev)) + return -EIO; + bio = bio_alloc(GFP_NOIO, 1); if (!bio) return -ENOMEM; @@ -658,22 +669,29 @@ static int dmz_write_dirty_mblocks(struc { struct dmz_mblock *mblk; struct blk_plug plug; - int ret = 0; + int ret = 0, nr_mblks_submitted = 0;
/* Issue writes */ blk_start_plug(&plug); - list_for_each_entry(mblk, write_list, link) - dmz_write_mblock(zmd, mblk, set); + list_for_each_entry(mblk, write_list, link) { + ret = dmz_write_mblock(zmd, mblk, set); + if (ret) + break; + nr_mblks_submitted++; + } blk_finish_plug(&plug);
/* Wait for completion */ list_for_each_entry(mblk, write_list, link) { + if (!nr_mblks_submitted) + break; wait_on_bit_io(&mblk->state, DMZ_META_WRITING, TASK_UNINTERRUPTIBLE); if (test_bit(DMZ_META_ERROR, &mblk->state)) { clear_bit(DMZ_META_ERROR, &mblk->state); ret = -EIO; } + nr_mblks_submitted--; }
/* Flush drive cache (this will also sync data) */ @@ -735,6 +753,11 @@ int dmz_flush_metadata(struct dmz_metada */ dmz_lock_flush(zmd);
+ if (dmz_bdev_is_dying(zmd->dev)) { + ret = -EIO; + goto out; + } + /* Get dirty blocks */ spin_lock(&zmd->mblk_lock); list_splice_init(&zmd->mblk_dirty_list, &write_list); @@ -1623,6 +1646,10 @@ again: /* Alloate a random zone */ dzone = dmz_alloc_zone(zmd, DMZ_ALLOC_RND); if (!dzone) { + if (dmz_bdev_is_dying(zmd->dev)) { + dzone = ERR_PTR(-EIO); + goto out; + } dmz_wait_for_free_zones(zmd); goto again; } @@ -1720,6 +1747,10 @@ again: /* Alloate a random zone */ bzone = dmz_alloc_zone(zmd, DMZ_ALLOC_RND); if (!bzone) { + if (dmz_bdev_is_dying(zmd->dev)) { + bzone = ERR_PTR(-EIO); + goto out; + } dmz_wait_for_free_zones(zmd); goto again; } --- a/drivers/md/dm-zoned-reclaim.c +++ b/drivers/md/dm-zoned-reclaim.c @@ -37,7 +37,7 @@ enum { /* * Number of seconds of target BIO inactivity to consider the target idle. */ -#define DMZ_IDLE_PERIOD (10UL * HZ) +#define DMZ_IDLE_PERIOD (10UL * HZ)
/* * Percentage of unmapped (free) random zones below which reclaim starts @@ -134,6 +134,9 @@ static int dmz_reclaim_copy(struct dmz_r set_bit(DM_KCOPYD_WRITE_SEQ, &flags);
while (block < end_block) { + if (dev->flags & DMZ_BDEV_DYING) + return -EIO; + /* Get a valid region from the source zone */ ret = dmz_first_valid_block(zmd, src_zone, &block); if (ret <= 0) @@ -453,6 +456,9 @@ static void dmz_reclaim_work(struct work unsigned int p_unmap_rnd; int ret;
+ if (dmz_bdev_is_dying(zrc->dev)) + return; + if (!dmz_should_reclaim(zrc)) { mod_delayed_work(zrc->wq, &zrc->work, DMZ_IDLE_PERIOD); return; @@ -482,8 +488,16 @@ static void dmz_reclaim_work(struct work p_unmap_rnd, nr_unmap_rnd, nr_rnd);
ret = dmz_do_reclaim(zrc); - if (ret) + if (ret) { dmz_dev_debug(zrc->dev, "Reclaim error %d\n", ret); + if (ret == -EIO) + /* + * LLD might be performing some error handling sequence + * at the underlying device. To not interfere, do not + * attempt to schedule the next reclaim run immediately. + */ + return; + }
dmz_schedule_reclaim(zrc); } --- a/drivers/md/dm-zoned-target.c +++ b/drivers/md/dm-zoned-target.c @@ -133,6 +133,8 @@ static int dmz_submit_bio(struct dmz_tar
atomic_inc(&bioctx->ref); generic_make_request(clone); + if (clone->bi_status == BLK_STS_IOERR) + return -EIO;
if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone)) zone->wp_block += nr_blocks; @@ -277,8 +279,8 @@ static int dmz_handle_buffered_write(str
/* Get the buffer zone. One will be allocated if needed */ bzone = dmz_get_chunk_buffer(zmd, zone); - if (!bzone) - return -ENOSPC; + if (IS_ERR(bzone)) + return PTR_ERR(bzone);
if (dmz_is_readonly(bzone)) return -EROFS; @@ -389,6 +391,11 @@ static void dmz_handle_bio(struct dmz_ta
dmz_lock_metadata(zmd);
+ if (dmz->dev->flags & DMZ_BDEV_DYING) { + ret = -EIO; + goto out; + } + /* * Get the data zone mapping the chunk. There may be no * mapping for read and discard. If a mapping is obtained, @@ -493,6 +500,8 @@ static void dmz_flush_work(struct work_s
/* Flush dirty metadata blocks */ ret = dmz_flush_metadata(dmz->metadata); + if (ret) + dmz_dev_debug(dmz->dev, "Metadata flush failed, rc=%d\n", ret);
/* Process queued flush requests */ while (1) { @@ -557,6 +566,32 @@ out: }
/* + * Check the backing device availability. If it's on the way out, + * start failing I/O. Reclaim and metadata components also call this + * function to cleanly abort operation in the event of such failure. + */ +bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev) +{ + struct gendisk *disk; + + if (!(dmz_dev->flags & DMZ_BDEV_DYING)) { + disk = dmz_dev->bdev->bd_disk; + if (blk_queue_dying(bdev_get_queue(dmz_dev->bdev))) { + dmz_dev_warn(dmz_dev, "Backing device queue dying"); + dmz_dev->flags |= DMZ_BDEV_DYING; + } else if (disk->fops->check_events) { + if (disk->fops->check_events(disk, 0) & + DISK_EVENT_MEDIA_CHANGE) { + dmz_dev_warn(dmz_dev, "Backing device offline"); + dmz_dev->flags |= DMZ_BDEV_DYING; + } + } + } + + return dmz_dev->flags & DMZ_BDEV_DYING; +} + +/* * Process a new BIO. */ static int dmz_map(struct dm_target *ti, struct bio *bio) @@ -569,6 +604,9 @@ static int dmz_map(struct dm_target *ti, sector_t chunk_sector; int ret;
+ if (dmz_bdev_is_dying(dmz->dev)) + return DM_MAPIO_KILL; + dmz_dev_debug(dev, "BIO op %d sector %llu + %u => chunk %llu, block %llu, %u blocks", bio_op(bio), (unsigned long long)sector, nr_sectors, (unsigned long long)dmz_bio_chunk(dmz->dev, bio), @@ -865,6 +903,9 @@ static int dmz_prepare_ioctl(struct dm_t { struct dmz_target *dmz = ti->private;
+ if (dmz_bdev_is_dying(dmz->dev)) + return -ENODEV; + *bdev = dmz->dev->bdev;
return 0; --- a/drivers/md/dm-zoned.h +++ b/drivers/md/dm-zoned.h @@ -56,6 +56,8 @@ struct dmz_dev {
unsigned int nr_zones;
+ unsigned int flags; + sector_t zone_nr_sectors; unsigned int zone_nr_sectors_shift;
@@ -67,6 +69,9 @@ struct dmz_dev { (dev)->zone_nr_sectors_shift) #define dmz_chunk_block(dev, b) ((b) & ((dev)->zone_nr_blocks - 1))
+/* Device flags. */ +#define DMZ_BDEV_DYING (1 << 0) + /* * Zone descriptor. */ @@ -245,4 +250,9 @@ void dmz_resume_reclaim(struct dmz_recla void dmz_reclaim_bio_acc(struct dmz_reclaim *zrc); void dmz_schedule_reclaim(struct dmz_reclaim *zrc);
+/* + * Functions defined in dm-zoned-target.c + */ +bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev); + #endif /* DM_ZONED_H */
From: Michael Kelley mikelley@microsoft.com
commit d0ff14fdc987303aeeb7de6f1bd72c3749ae2a9b upstream.
If alloc_descs() fails before irq_sysfs_init() has run, free_desc() in the cleanup path will call kobject_del() even though the kobject has not been added with kobject_add().
Fix this by making the call to kobject_del() conditional on whether irq_sysfs_init() has run.
This problem surfaced because commit aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") makes kobject_del() stricter about pairing with kobject_add(). If the pairing is incorrrect, a WARNING and backtrace occur in sysfs_remove_group() because there is no parent.
[ tglx: Add a comment to the code and make it work with CONFIG_SYSFS=n ]
Fixes: ecb3f394c5db ("genirq: Expose interrupt information through sysfs") Signed-off-by: Michael Kelley mikelley@microsoft.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1564703564-4116-1-git-send-email-mikelley@microsof... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/irqdesc.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
--- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -277,6 +277,18 @@ static void irq_sysfs_add(int irq, struc } }
+static void irq_sysfs_del(struct irq_desc *desc) +{ + /* + * If irq_sysfs_init() has not yet been invoked (early boot), then + * irq_kobj_base is NULL and the descriptor was never added. + * kobject_del() complains about a object with no parent, so make + * it conditional. + */ + if (irq_kobj_base) + kobject_del(&desc->kobj); +} + static int __init irq_sysfs_init(void) { struct irq_desc *desc; @@ -307,6 +319,7 @@ static struct kobj_type irq_kobj_type = };
static void irq_sysfs_add(int irq, struct irq_desc *desc) {} +static void irq_sysfs_del(struct irq_desc *desc) {}
#endif /* CONFIG_SYSFS */
@@ -420,7 +433,7 @@ static void free_desc(unsigned int irq) * The sysfs entry must be serialized against a concurrent * irq_sysfs_init() as well. */ - kobject_del(&desc->kobj); + irq_sysfs_del(desc); delete_irq_desc(irq);
/*
From: Vlastimil Babka vbabka@suse.cz
commit f7da677bc6e72033f0981b9d58b5c5d409fa641e upstream.
THP splitting path is missing the split_page_owner() call that split_page() has.
As a result, split THP pages are wrongly reported in the page_owner file as order-9 pages. Furthermore when the former head page is freed, the remaining former tail pages are not listed in the page_owner file at all. This patch fixes that by adding the split_page_owner() call into __split_huge_page().
Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz Fixes: a9627bc5e34e ("mm/page_owner: introduce split_page_owner and replace manual handling") Reported-by: Kirill A. Shutemov kirill@shutemov.name Signed-off-by: Vlastimil Babka vbabka@suse.cz Cc: Michal Hocko mhocko@kernel.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/huge_memory.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,7 @@ #include <linux/page_idle.h> #include <linux/shmem_fs.h> #include <linux/oom.h> +#include <linux/page_owner.h>
#include <asm/tlb.h> #include <asm/pgalloc.h> @@ -2387,6 +2388,9 @@ static void __split_huge_page(struct pag }
ClearPageCompound(head); + + split_page_owner(head, HPAGE_PMD_ORDER); + /* See comment in __split_huge_page_tail() */ if (PageAnon(head)) { /* Additional pin to radix tree of swap cache */
From: Henry Burns henryburns@google.com
commit 1a87aa03597efa9641e92875b883c94c7f872ccb upstream.
In zs_page_migrate() we call putback_zspage() after we have finished migrating all pages in this zspage. However, the return value is ignored. If a zs_free() races in between zs_page_isolate() and zs_page_migrate(), freeing the last object in the zspage, putback_zspage() will leave the page in ZS_EMPTY for potentially an unbounded amount of time.
To fix this, we need to do the same thing as zs_page_putback() does: schedule free_work to occur.
To avoid duplicated code, move the sequence to a new putback_zspage_deferred() function which both zs_page_migrate() and zs_page_putback() call.
Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Henry Burns henryburns@google.com Reviewed-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Henry Burns henrywolfeburns@gmail.com Cc: Minchan Kim minchan@kernel.org Cc: Shakeel Butt shakeelb@google.com Cc: Jonathan Adams jwadams@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/zsmalloc.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
--- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1878,6 +1878,18 @@ static void dec_zspage_isolation(struct zspage->isolated--; }
+static void putback_zspage_deferred(struct zs_pool *pool, + struct size_class *class, + struct zspage *zspage) +{ + enum fullness_group fg; + + fg = putback_zspage(class, zspage); + if (fg == ZS_EMPTY) + schedule_work(&pool->free_work); + +} + static void replace_sub_page(struct size_class *class, struct zspage *zspage, struct page *newpage, struct page *oldpage) { @@ -2047,7 +2059,7 @@ int zs_page_migrate(struct address_space * the list if @page is final isolated subpage in the zspage. */ if (!is_zspage_isolated(zspage)) - putback_zspage(class, zspage); + putback_zspage_deferred(pool, class, zspage);
reset_page(page); put_page(page); @@ -2093,14 +2105,13 @@ void zs_page_putback(struct page *page) spin_lock(&class->lock); dec_zspage_isolation(zspage); if (!is_zspage_isolated(zspage)) { - fg = putback_zspage(class, zspage); /* * Due to page_lock, we cannot free zspage immediately * so let's defer. */ - if (fg == ZS_EMPTY) - schedule_work(&pool->free_work); + putback_zspage_deferred(pool, class, zspage); } + spin_unlock(&class->lock); }
From: Henry Burns henryburns@google.com
commit 701d678599d0c1623aaf4139c03eea260a75b027 upstream.
In zs_destroy_pool() we call flush_work(&pool->free_work). However, we have no guarantee that migration isn't happening in the background at that time.
Since migration can't directly free pages, it relies on free_work being scheduled to free the pages. But there's nothing preventing an in-progress migrate from queuing the work *after* zs_unregister_migration() has called flush_work(). Which would mean pages still pointing at the inode when we free it.
Since we know at destroy time all objects should be free, no new migrations can come in (since zs_page_isolate() fails for fully-free zspages). This means it is sufficient to track a "# isolated zspages" count by class, and have the destroy logic ensure all such pages have drained before proceeding. Keeping that state under the class spinlock keeps the logic straightforward.
In this case a memory leak could lead to an eventual crash if compaction hits the leaked page. This crash would only occur if people are changing their zswap backend at runtime (which eventually starts destruction).
Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Henry Burns henryburns@google.com Reviewed-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Henry Burns henrywolfeburns@gmail.com Cc: Minchan Kim minchan@kernel.org Cc: Shakeel Butt shakeelb@google.com Cc: Jonathan Adams jwadams@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/zsmalloc.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 59 insertions(+), 2 deletions(-)
--- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -52,6 +52,7 @@ #include <linux/zpool.h> #include <linux/mount.h> #include <linux/migrate.h> +#include <linux/wait.h> #include <linux/pagemap.h>
#define ZSPAGE_MAGIC 0x58 @@ -267,6 +268,10 @@ struct zs_pool { #ifdef CONFIG_COMPACTION struct inode *inode; struct work_struct free_work; + /* A wait queue for when migration races with async_free_zspage() */ + struct wait_queue_head migration_wait; + atomic_long_t isolated_pages; + bool destroying; #endif };
@@ -1890,6 +1895,19 @@ static void putback_zspage_deferred(stru
}
+static inline void zs_pool_dec_isolated(struct zs_pool *pool) +{ + VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0); + atomic_long_dec(&pool->isolated_pages); + /* + * There's no possibility of racing, since wait_for_isolated_drain() + * checks the isolated count under &class->lock after enqueuing + * on migration_wait. + */ + if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying) + wake_up_all(&pool->migration_wait); +} + static void replace_sub_page(struct size_class *class, struct zspage *zspage, struct page *newpage, struct page *oldpage) { @@ -1959,6 +1977,7 @@ bool zs_page_isolate(struct page *page, */ if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) { get_zspage_mapping(zspage, &class_idx, &fullness); + atomic_long_inc(&pool->isolated_pages); remove_zspage(class, zspage, fullness); }
@@ -2058,8 +2077,16 @@ int zs_page_migrate(struct address_space * Page migration is done so let's putback isolated zspage to * the list if @page is final isolated subpage in the zspage. */ - if (!is_zspage_isolated(zspage)) + if (!is_zspage_isolated(zspage)) { + /* + * We cannot race with zs_destroy_pool() here because we wait + * for isolation to hit zero before we start destroying. + * Also, we ensure that everyone can see pool->destroying before + * we start waiting. + */ putback_zspage_deferred(pool, class, zspage); + zs_pool_dec_isolated(pool); + }
reset_page(page); put_page(page); @@ -2110,8 +2137,8 @@ void zs_page_putback(struct page *page) * so let's defer. */ putback_zspage_deferred(pool, class, zspage); + zs_pool_dec_isolated(pool); } - spin_unlock(&class->lock); }
@@ -2134,8 +2161,36 @@ static int zs_register_migration(struct return 0; }
+static bool pool_isolated_are_drained(struct zs_pool *pool) +{ + return atomic_long_read(&pool->isolated_pages) == 0; +} + +/* Function for resolving migration */ +static void wait_for_isolated_drain(struct zs_pool *pool) +{ + + /* + * We're in the process of destroying the pool, so there are no + * active allocations. zs_page_isolate() fails for completely free + * zspages, so we need only wait for the zs_pool's isolated + * count to hit zero. + */ + wait_event(pool->migration_wait, + pool_isolated_are_drained(pool)); +} + static void zs_unregister_migration(struct zs_pool *pool) { + pool->destroying = true; + /* + * We need a memory barrier here to ensure global visibility of + * pool->destroying. Thus pool->isolated pages will either be 0 in which + * case we don't care, or it will be > 0 and pool->destroying will + * ensure that we wake up once isolation hits 0. + */ + smp_mb(); + wait_for_isolated_drain(pool); /* This can block */ flush_work(&pool->free_work); iput(pool->inode); } @@ -2376,6 +2431,8 @@ struct zs_pool *zs_create_pool(const cha if (!pool->name) goto err;
+ init_waitqueue_head(&pool->migration_wait); + if (create_cache(pool)) goto err;
From: Darrick J. Wong darrick.wong@oracle.com
commit 1fb254aa983bf190cfd685d40c64a480a9bafaee upstream.
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp fails on account of being out of disk quota. I ran his reproducer script:
# adduser dummy # adduser dummy plugdev
# dd if=/dev/zero bs=1M count=100 of=test.img # mkfs.xfs test.img # mount -t xfs -o gquota test.img /mnt # mkdir -p /mnt/dummy # chown -c dummy /mnt/dummy # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt
(and then as user dummy)
$ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo $ chgrp plugdev /mnt/dummy/foo
and saw:
================================================ WARNING: lock held when returning to user space! 5.3.0-rc5 #rc5 Tainted: G W ------------------------------------------------ chgrp/47006 is leaving the kernel with locks still held! 1 lock held by chgrp/47006: #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]
...which is clearly caused by xfs_setattr_nonsize failing to unlock the ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing unlock.
Reported-by: benjamin.moody@gmail.com Fixes: 253f4911f297 ("xfs: better xfs_trans_alloc interface") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Tested-by: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/xfs/xfs_iops.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -789,6 +789,7 @@ xfs_setattr_nonsize(
out_cancel: xfs_trans_cancel(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); out_dqrele: xfs_qm_dqrele(udqp); xfs_qm_dqrele(gdqp);
[ Upstream commit e0702d90b79d430b0ccc276ead4f88440bb51352 ]
This function is supposed to return error pointers so it matches the dmz_get_rnd_zone_for_reclaim() function. The current code could lead to a NULL dereference in dmz_do_reclaim()
Fixes: b234c6d7a703 ("dm zoned: improve error handling in reclaim") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Dmitry Fomichev dmitry.fomichev@wdc.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-zoned-metadata.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c index ccf17eb6adaa2..b322821a6323b 100644 --- a/drivers/md/dm-zoned-metadata.c +++ b/drivers/md/dm-zoned-metadata.c @@ -1579,7 +1579,7 @@ static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd) struct dm_zone *zone;
if (list_empty(&zmd->map_seq_list)) - return NULL; + return ERR_PTR(-EBUSY);
list_for_each_entry(zone, &zmd->map_seq_list, link) { if (!zone->bzone) @@ -1588,7 +1588,7 @@ static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd) return zone; }
- return NULL; + return ERR_PTR(-EBUSY); }
/*
From: Alastair D'Silva alastair@d-silva.org
The upstream commit: 22e9c88d486a ("powerpc/64: reuse PPC32 static inline flush_dcache_range()") has a similar effect, but since it is a rewrite of the assembler to C, is too invasive for stable. This patch is a minimal fix to address the issue in assembler.
This patch applies cleanly to v5.2, v4.19 & v4.14.
When calling flush_(inval_)dcache_range with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended.
This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for.
Signed-off-by: Alastair D'Silva alastair@d-silva.org Acked-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/misc_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -134,7 +134,7 @@ _GLOBAL_TOC(flush_dcache_range) subf r8,r6,r4 /* compute length */ add r8,r8,r5 /* ensure we get enough */ lwz r9,DCACHEL1LOGBLOCKSIZE(r10) /* Get log-2 of dcache block size */ - srw. r8,r8,r9 /* compute line count */ + srd. r8,r8,r9 /* compute line count */ beqlr /* nothing to do? */ mtctr r8 0: dcbst 0,r6 @@ -190,7 +190,7 @@ _GLOBAL(flush_inval_dcache_range) subf r8,r6,r4 /* compute length */ add r8,r8,r5 /* ensure we get enough */ lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block size */ - srw. r8,r8,r9 /* compute line count */ + srd. r8,r8,r9 /* compute line count */ beqlr /* nothing to do? */ sync isync
On Tue, Aug 27, 2019 at 09:50:05AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.141 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 29 Aug 2019 07:25:02 AM UTC. Anything received after that time might be too late.
Build results: total: 172 pass: 172 fail: 0 Qemu test results: total: 372 pass: 372 fail: 0
Guenter
On 8/27/19 1:50 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.141 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 29 Aug 2019 07:25:02 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.141-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
stable-rc/linux-4.14.y boot: 125 boots: 2 failed, 112 passed with 10 offline, 1 untried/unknown (v4.14.140-63-g4e1a19d20000)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.14... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.140-63...
Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.140-63-g4e1a19d20000 Git Commit: 4e1a19d20000330e767ddc9c64317d7d576a8f31 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 64 unique boards, 24 SoC families, 14 builds out of 201
Boot Failures Detected:
arm64: defconfig: gcc-8: meson-gxl-s905x-nexbox-a95x: 1 failed lab
arm: sama5_defconfig: gcc-8: at91-sama5d4_xplained: 1 failed lab
Offline Platforms:
arm64:
defconfig: gcc-8 apq8016-sbc: 1 offline lab meson-gxbb-odroidc2: 1 offline lab
arm:
tegra_defconfig: gcc-8 tegra20-iris-512: 1 offline lab
multi_v7_defconfig: gcc-8 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab sun5i-r8-chip: 1 offline lab tegra20-iris-512: 1 offline lab
qcom_defconfig: gcc-8 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab
sunxi_defconfig: gcc-8 sun5i-r8-chip: 1 offline lab
--- For more info write to info@kernelci.org
On Tue, 27 Aug 2019 at 13:22, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.141 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu 29 Aug 2019 07:25:02 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.141-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.14.141-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 4e1a19d20000330e767ddc9c64317d7d576a8f31 git describe: v4.14.140-63-g4e1a19d20000 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.140-6...
No regressions (compared to build v4.14.140)
No fixes (compared to build v4.14.140)
Ran 23744 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-fs-tests * network-basic-tests * ltp-open-posix-tests * kvm-unit-tests * ssuite * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
linux-stable-mirror@lists.linaro.org