After commit 74d905d2d38a devices requiring the workaround
for edge triggered interrupts stopped working.
This is because the "data" state container defaults to
*not* using the workaround, but the workaround gets used
*before* the check of whether it is needed or not. This
semantic is not obvious from just looking on the patch,
but related to the program flow.
The hardware needs the quirk to be used before even
proceeding to check if the quirk is needed.
This patch makes the quirk be used until we determine
it is *not* needed.
Cc: Andre <andre.muller(a)web.de>
Cc: Nick Dyer <nick.dyer(a)itdev.co.uk>
Cc: Jiada Wang <jiada_wang(a)mentor.com>
Cc: stable(a)vger.kernel.org
Fixes: 74d905d2d38a ("Input: atmel_mxt_ts - only read messages in mxt_acquire_irq() when necessary")
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
drivers/input/touchscreen/atmel_mxt_ts.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c b/drivers/input/touchscreen/atmel_mxt_ts.c
index e34984388791..f25b2f6038a7 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -1297,8 +1297,6 @@ static int mxt_check_retrigen(struct mxt_data *data)
int val;
struct irq_data *irqd;
- data->use_retrigen_workaround = false;
-
irqd = irq_get_irq_data(data->irq);
if (!irqd)
return -EINVAL;
@@ -1313,8 +1311,10 @@ static int mxt_check_retrigen(struct mxt_data *data)
if (error)
return error;
- if (val & MXT_COMMS_RETRIGEN)
+ if (val & MXT_COMMS_RETRIGEN) {
+ data->use_retrigen_workaround = false;
return 0;
+ }
}
dev_warn(&client->dev, "Enabling RETRIGEN workaround\n");
@@ -3117,6 +3117,7 @@ static int mxt_probe(struct i2c_client *client, const struct i2c_device_id *id)
data = devm_kzalloc(&client->dev, sizeof(struct mxt_data), GFP_KERNEL);
if (!data)
return -ENOMEM;
+ data->use_retrigen_workaround = true;
snprintf(data->phys, sizeof(data->phys), "i2c-%u-%04x/input0",
client->adapter->nr, client->addr);
--
2.26.2
First six of the backport address numerous problems troubling HDAudio
configuration users for Skylake driver. Upstream series:
"ASoC: Intel: Skylake: Fix HDaudio and Dmic" [1] provides the
explanation and reasoning behind it. These have been initialy pushed
into v5.7-rc1 via: "sound updates for 5.7-rc1" [2] by Takashi.
Last two patches are from: "Add support for different DMIC
configurations" [3] which focuses on HDAudio with DMIC configuration.
Patch: "ASoC: Intel: Skylake: Add alternative topology binary name"
of the mentioned series has already been merged to v5.4.y -stable and
thus it's not included here.
Fixes target mainly Skylake and Kabylake based platforms, released
in 2015-2016 period.
[1]: https://lore.kernel.org/alsa-devel/20200305145314.32579-1-cezary.rojewski@i…
[2]: https://lore.kernel.org/lkml/s5htv22uso8.wl-tiwai@suse.de/
[3]: https://lore.kernel.org/alsa-devel/20200427132727.24942-1-mateusz.gorski@li…
Cezary Rojewski (6):
ASoC: Intel: Skylake: Remove superfluous chip initialization
ASoC: Intel: Skylake: Select hda configuration permissively
ASoC: Intel: Skylake: Enable codec wakeup during chip init
ASoC: Intel: Skylake: Shield against no-NHLT configurations
ASoC: Intel: Allow for ROM init retry on CNL platforms
ASoC: Intel: Skylake: Await purge request ack on CNL
Mateusz Gorski (2):
ASoC: Intel: Multiple I/O PCM format support for pipe
ASoC: Intel: Skylake: Automatic DMIC format configuration according to
information from NHLT
include/uapi/sound/skl-tplg-interface.h | 2 +
sound/soc/intel/skylake/bxt-sst.c | 3 -
sound/soc/intel/skylake/cnl-sst.c | 35 ++++--
sound/soc/intel/skylake/skl-nhlt.c | 3 +-
sound/soc/intel/skylake/skl-sst-dsp.h | 2 +
sound/soc/intel/skylake/skl-topology.c | 159 +++++++++++++++++++++++-
sound/soc/intel/skylake/skl-topology.h | 1 +
sound/soc/intel/skylake/skl.c | 29 ++---
8 files changed, 204 insertions(+), 30 deletions(-)
--
2.17.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 184eead057cc7e803558269babc1f2cfb9113ad1 Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Thu, 19 Nov 2020 12:00:40 -0500
Subject: [PATCH] USB: core: Fix regression in Hercules audio card
Commit 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
aimed to make the USB stack more reliable by detecting and skipping
over endpoints that are duplicated between interfaces. This caused a
regression for a Hercules audio card (reported as Bugzilla #208357),
which contains such non-compliant duplications. Although the
duplications are harmless, skipping the valid endpoints prevented the
device from working.
This patch fixes the regression by adding ENDPOINT_IGNORE quirks for
the Hercules card, telling the kernel to ignore the invalid duplicate
endpoints and thereby allowing the valid endpoints to be used as
intended.
Fixes: 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
CC: <stable(a)vger.kernel.org>
Reported-by: Alexander Chalikiopoulos <bugzilla.kernel.org(a)mrtoasted.com>
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Link: https://lore.kernel.org/r/20201119170040.GA576844@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index f536ea9fe945..fad31ccd1fa8 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -348,6 +348,10 @@ static const struct usb_device_id usb_quirk_list[] = {
/* Guillemot Webcam Hercules Dualpix Exchange*/
{ USB_DEVICE(0x06f8, 0x3005), .driver_info = USB_QUIRK_RESET_RESUME },
+ /* Guillemot Hercules DJ Console audio card (BZ 208357) */
+ { USB_DEVICE(0x06f8, 0xb000), .driver_info =
+ USB_QUIRK_ENDPOINT_IGNORE },
+
/* Midiman M-Audio Keystation 88es */
{ USB_DEVICE(0x0763, 0x0192), .driver_info = USB_QUIRK_RESET_RESUME },
@@ -525,6 +529,8 @@ static const struct usb_device_id usb_amd_resume_quirk_list[] = {
* Matched for devices with USB_QUIRK_ENDPOINT_IGNORE.
*/
static const struct usb_device_id usb_endpoint_ignore[] = {
+ { USB_DEVICE_INTERFACE_NUMBER(0x06f8, 0xb000, 5), .driver_info = 0x01 },
+ { USB_DEVICE_INTERFACE_NUMBER(0x06f8, 0xb000, 5), .driver_info = 0x81 },
{ USB_DEVICE_INTERFACE_NUMBER(0x0926, 0x0202, 1), .driver_info = 0x85 },
{ USB_DEVICE_INTERFACE_NUMBER(0x0926, 0x0208, 1), .driver_info = 0x85 },
{ }
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f5c042b23f7429e5c2ac987b01a31c69059a978b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= <mirq-linux(a)rere.qmqm.pl>
Date: Fri, 13 Nov 2020 01:20:28 +0100
Subject: [PATCH] regulator: workaround self-referent regulators
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Workaround regulators whose supply name happens to be the same as its
own name. This fixes boards that used to work before the early supply
resolving was removed. The error message is left in place so that
offending drivers can be detected.
Fixes: aea6cb99703e ("regulator: resolve supply after creating regulator")
Cc: stable(a)vger.kernel.org
Reported-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de>
Signed-off-by: Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
Tested-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de> # stpmic1
Link: https://lore.kernel.org/r/d703acde2a93100c3c7a81059d716c50ad1b1f52.16052266…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index d3b96efa20fd..42bbd99a36ac 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1844,7 +1844,10 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
if (r == rdev) {
dev_err(dev, "Supply for %s (%s) resolved to itself\n",
rdev->desc->name, rdev->supply_name);
- return -EINVAL;
+ if (!have_full_constraints())
+ return -EINVAL;
+ r = dummy_regulator_rdev;
+ get_device(&r->dev);
}
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fd8d9db3559a29fd737bcdb7c4fcbe1940caae34 Mon Sep 17 00:00:00 2001
From: Xiaochen Shen <xiaochen.shen(a)intel.com>
Date: Sat, 31 Oct 2020 03:10:53 +0800
Subject: [PATCH] x86/resctrl: Remove superfluous kernfs_get() calls to prevent
refcount leak
Willem reported growing of kernfs_node_cache entries in slabtop when
repeatedly creating and removing resctrl subdirectories as well as when
repeatedly mounting and unmounting the resctrl filesystem.
On resource group (control as well as monitoring) creation via a mkdir
an extra kernfs_node reference is obtained to ensure that the rdtgroup
structure remains accessible for the rdtgroup_kn_unlock() calls where it
is removed on deletion. The kernfs_node reference count is dropped by
kernfs_put() in rdtgroup_kn_unlock().
With the above explaining the need for one kernfs_get()/kernfs_put()
pair in resctrl there are more places where a kernfs_node reference is
obtained without a corresponding release. The excessive amount of
reference count on kernfs nodes will never be dropped to 0 and the
kernfs nodes will never be freed in the call paths of rmdir and umount.
It leads to reference count leak and kernfs_node_cache memory leak.
Remove the superfluous kernfs_get() calls and expand the existing
comments surrounding the remaining kernfs_get()/kernfs_put() pair that
remains in use.
Superfluous kernfs_get() calls are removed from two areas:
(1) In call paths of mount and mkdir, when kernfs nodes for "info",
"mon_groups" and "mon_data" directories and sub-directories are
created, the reference count of newly created kernfs node is set to 1.
But after kernfs_create_dir() returns, superfluous kernfs_get() are
called to take an additional reference.
(2) kernfs_get() calls in rmdir call paths.
Fixes: 17eafd076291 ("x86/intel_rdt: Split resource group removal in two")
Fixes: 4af4a88e0c92 ("x86/intel_rdt/cqm: Add mount,umount support")
Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data")
Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Fixes: 5dc1d5c6bac2 ("x86/intel_rdt: Simplify info and base file lists")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system")
Reported-by: Willem de Bruijn <willemb(a)google.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
Tested-by: Willem de Bruijn <willemb(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/1604085053-31639-1-git-send-email-xiaochen.shen@i…
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index af323e2e3100..2ab1266a5f14 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1769,7 +1769,6 @@ static int rdtgroup_mkdir_info_resdir(struct rdt_resource *r, char *name,
if (IS_ERR(kn_subdir))
return PTR_ERR(kn_subdir);
- kernfs_get(kn_subdir);
ret = rdtgroup_kn_set_ugid(kn_subdir);
if (ret)
return ret;
@@ -1792,7 +1791,6 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
kn_info = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL);
if (IS_ERR(kn_info))
return PTR_ERR(kn_info);
- kernfs_get(kn_info);
ret = rdtgroup_add_files(kn_info, RF_TOP_INFO);
if (ret)
@@ -1813,12 +1811,6 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
goto out_destroy;
}
- /*
- * This extra ref will be put in kernfs_remove() and guarantees
- * that @rdtgrp->kn is always accessible.
- */
- kernfs_get(kn_info);
-
ret = rdtgroup_kn_set_ugid(kn_info);
if (ret)
goto out_destroy;
@@ -1847,12 +1839,6 @@ mongroup_create_dir(struct kernfs_node *parent_kn, struct rdtgroup *prgrp,
if (dest_kn)
*dest_kn = kn;
- /*
- * This extra ref will be put in kernfs_remove() and guarantees
- * that @rdtgrp->kn is always accessible.
- */
- kernfs_get(kn);
-
ret = rdtgroup_kn_set_ugid(kn);
if (ret)
goto out_destroy;
@@ -2139,13 +2125,11 @@ static int rdt_get_tree(struct fs_context *fc)
&kn_mongrp);
if (ret < 0)
goto out_info;
- kernfs_get(kn_mongrp);
ret = mkdir_mondata_all(rdtgroup_default.kn,
&rdtgroup_default, &kn_mondata);
if (ret < 0)
goto out_mongrp;
- kernfs_get(kn_mondata);
rdtgroup_default.mon.mon_data_kn = kn_mondata;
}
@@ -2499,11 +2483,6 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
if (IS_ERR(kn))
return PTR_ERR(kn);
- /*
- * This extra ref will be put in kernfs_remove() and guarantees
- * that kn is always accessible.
- */
- kernfs_get(kn);
ret = rdtgroup_kn_set_ugid(kn);
if (ret)
goto out_destroy;
@@ -2838,8 +2817,8 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
/*
* kernfs_remove() will drop the reference count on "kn" which
* will free it. But we still need it to stick around for the
- * rdtgroup_kn_unlock(kn} call below. Take one extra reference
- * here, which will be dropped inside rdtgroup_kn_unlock().
+ * rdtgroup_kn_unlock(kn) call. Take one extra reference here,
+ * which will be dropped inside rdtgroup_kn_unlock().
*/
kernfs_get(kn);
@@ -3049,11 +3028,6 @@ static int rdtgroup_rmdir_mon(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list));
list_del(&rdtgrp->mon.crdtgrp_list);
- /*
- * one extra hold on this, will drop when we kfree(rdtgrp)
- * in rdtgroup_kn_unlock()
- */
- kernfs_get(kn);
kernfs_remove(rdtgrp->kn);
return 0;
@@ -3065,11 +3039,6 @@ static int rdtgroup_ctrl_remove(struct kernfs_node *kn,
rdtgrp->flags = RDT_DELETED;
list_del(&rdtgrp->rdtgroup_list);
- /*
- * one extra hold on this, will drop when we kfree(rdtgrp)
- * in rdtgroup_kn_unlock()
- */
- kernfs_get(kn);
kernfs_remove(rdtgrp->kn);
return 0;
}
Description of issue:
Patch with commit id "c1a6c5ac4278d406c112cc2f038e6e506feadff9"
has modified ioctl path 'timeout' variable type to u8 from
unsigned long. With this maximum timeout value that the driver
can support is 255 seconds. But for some commands application is
providing the higher timeout value (512 seconds as default), so
it will be round off to zero value. Hence timeout is observed
immediately and the IOCTL request fails.
Fix:
Inorder to accommodate higher timeout value change datatype back to
unsigned long.
Cc: <stable(a)vger.kernel.org> #v4.18+
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani(a)broadcom.com>
---
drivers/scsi/mpt3sas/mpt3sas_ctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
index b21aa55..c8a0ce1 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
@@ -664,7 +664,7 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command karg,
Mpi26NVMeEncapsulatedRequest_t *nvme_encap_request = NULL;
struct _pcie_device *pcie_device = NULL;
u16 smid;
- u8 timeout;
+ unsigned long timeout;
u8 issue_reset;
u32 sz, sz_arg;
void *psge;
--
2.27.0
From: "DingHua Ma" <dinghua.ma.sz(a)gmail.com>
When I use the axp20x chip to power my SDIO device on the 5.4 kernel,
the output voltage of DLDO2 is wrong. After comparing the register
manual and source code of the chip, I found that the mask bit of the
driver register of the port was wrong. I fixed this error by modifying
the mask register of the source code. This error seems to be a copy
error of the macro when writing the code. Now the voltage output of
the DLDO2 port of axp20x is correct. My development environment is
Allwinner A40I of arm architecture, and the kernel version is 5.4.
Signed-off-by: DingHua Ma <dinghua.ma.sz(a)gmail.com>
Reviewed-by: Chen-Yu Tsai <wens(a)csie.org>
Cc: <stable(a)vger.kernel.org>
Fixes: db4a555f7c4c ("regulator: axp20x: use defines for masks")
---
Changes since v2:
- Modify topic description
---
Changes since v1:
- More accurate description for this patch
---
drivers/regulator/axp20x-regulator.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/regulator/axp20x-regulator.c b/drivers/regulator/axp20x-regulator.c
index cd1224182ad7..90cb8445f721 100644
--- a/drivers/regulator/axp20x-regulator.c
+++ b/drivers/regulator/axp20x-regulator.c
@@ -594,7 +594,7 @@ static const struct regulator_desc axp22x_regulators[] = {
AXP22X_DLDO1_V_OUT, AXP22X_DLDO1_V_OUT_MASK,
AXP22X_PWR_OUT_CTRL2, AXP22X_PWR_OUT_DLDO1_MASK),
AXP_DESC(AXP22X, DLDO2, "dldo2", "dldoin", 700, 3300, 100,
- AXP22X_DLDO2_V_OUT, AXP22X_PWR_OUT_DLDO2_MASK,
+ AXP22X_DLDO2_V_OUT, AXP22X_DLDO2_V_OUT_MASK,
AXP22X_PWR_OUT_CTRL2, AXP22X_PWR_OUT_DLDO2_MASK),
AXP_DESC(AXP22X, DLDO3, "dldo3", "dldoin", 700, 3300, 100,
AXP22X_DLDO3_V_OUT, AXP22X_DLDO3_V_OUT_MASK,
--
2.25.1
After commit 74d905d2d38a devices requiring the workaround
for edge triggered interrupts stopped working.
This is because the "data" state container defaults to
*not* using the workaround, but the workaround gets used
*before* the check of whether it is needed or not. This
semantic is not obvious from just looking on the patch,
but related to the program flow.
The hardware needs the quirk to be used before even
proceeding to check if the quirk is needed.
This patch makes the quirk be used until we determine
it is *not* needed. It is determined as not needed when
we either have a level-triggered interrupt or the
firmware claims that it has enabled retrigging.
Cc: Andre Müller <andre.muller(a)web.de>
Cc: Nick Dyer <nick.dyer(a)itdev.co.uk>
Cc: Jiada Wang <jiada_wang(a)mentor.com>
Cc: stable(a)vger.kernel.org
Reported-by: Andre Müller <andre.muller(a)web.de>
Fixes: 74d905d2d38a ("Input: atmel_mxt_ts - only read messages in mxt_acquire_irq() when necessary")
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
ChangeLog v1->v2:
- Explicitly disable the retrig workaround also if the
IRQ descriptor says we have a level triggered interrupt.
- Drop the second explicit assigning of "true" to the
use_retrigen_workaround bool, it is already enabled.
- Augment debug text to say that we leave it enabled
rather than enable it.
---
drivers/input/touchscreen/atmel_mxt_ts.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c b/drivers/input/touchscreen/atmel_mxt_ts.c
index e34984388791..c822db8dbd02 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -1297,14 +1297,18 @@ static int mxt_check_retrigen(struct mxt_data *data)
int val;
struct irq_data *irqd;
- data->use_retrigen_workaround = false;
-
irqd = irq_get_irq_data(data->irq);
if (!irqd)
return -EINVAL;
- if (irqd_is_level_type(irqd))
+ if (irqd_is_level_type(irqd)) {
+ /*
+ * We don't need the workaround if we have level trigged
+ * interrupts. This will just work fine.
+ */
+ data->use_retrigen_workaround = false;
return 0;
+ }
if (data->T18_address) {
error = __mxt_read_reg(client,
@@ -1313,12 +1317,13 @@ static int mxt_check_retrigen(struct mxt_data *data)
if (error)
return error;
- if (val & MXT_COMMS_RETRIGEN)
+ if (val & MXT_COMMS_RETRIGEN) {
+ data->use_retrigen_workaround = false;
return 0;
+ }
}
- dev_warn(&client->dev, "Enabling RETRIGEN workaround\n");
- data->use_retrigen_workaround = true;
+ dev_warn(&client->dev, "Leaving RETRIGEN workaround enabled\n");
return 0;
}
@@ -3117,6 +3122,7 @@ static int mxt_probe(struct i2c_client *client, const struct i2c_device_id *id)
data = devm_kzalloc(&client->dev, sizeof(struct mxt_data), GFP_KERNEL);
if (!data)
return -ENOMEM;
+ data->use_retrigen_workaround = true;
snprintf(data->phys, sizeof(data->phys), "i2c-%u-%04x/input0",
client->adapter->nr, client->addr);
--
2.26.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 184eead057cc7e803558269babc1f2cfb9113ad1 Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Thu, 19 Nov 2020 12:00:40 -0500
Subject: [PATCH] USB: core: Fix regression in Hercules audio card
Commit 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
aimed to make the USB stack more reliable by detecting and skipping
over endpoints that are duplicated between interfaces. This caused a
regression for a Hercules audio card (reported as Bugzilla #208357),
which contains such non-compliant duplications. Although the
duplications are harmless, skipping the valid endpoints prevented the
device from working.
This patch fixes the regression by adding ENDPOINT_IGNORE quirks for
the Hercules card, telling the kernel to ignore the invalid duplicate
endpoints and thereby allowing the valid endpoints to be used as
intended.
Fixes: 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
CC: <stable(a)vger.kernel.org>
Reported-by: Alexander Chalikiopoulos <bugzilla.kernel.org(a)mrtoasted.com>
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Link: https://lore.kernel.org/r/20201119170040.GA576844@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index f536ea9fe945..fad31ccd1fa8 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -348,6 +348,10 @@ static const struct usb_device_id usb_quirk_list[] = {
/* Guillemot Webcam Hercules Dualpix Exchange*/
{ USB_DEVICE(0x06f8, 0x3005), .driver_info = USB_QUIRK_RESET_RESUME },
+ /* Guillemot Hercules DJ Console audio card (BZ 208357) */
+ { USB_DEVICE(0x06f8, 0xb000), .driver_info =
+ USB_QUIRK_ENDPOINT_IGNORE },
+
/* Midiman M-Audio Keystation 88es */
{ USB_DEVICE(0x0763, 0x0192), .driver_info = USB_QUIRK_RESET_RESUME },
@@ -525,6 +529,8 @@ static const struct usb_device_id usb_amd_resume_quirk_list[] = {
* Matched for devices with USB_QUIRK_ENDPOINT_IGNORE.
*/
static const struct usb_device_id usb_endpoint_ignore[] = {
+ { USB_DEVICE_INTERFACE_NUMBER(0x06f8, 0xb000, 5), .driver_info = 0x01 },
+ { USB_DEVICE_INTERFACE_NUMBER(0x06f8, 0xb000, 5), .driver_info = 0x81 },
{ USB_DEVICE_INTERFACE_NUMBER(0x0926, 0x0202, 1), .driver_info = 0x85 },
{ USB_DEVICE_INTERFACE_NUMBER(0x0926, 0x0208, 1), .driver_info = 0x85 },
{ }
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 758999246965eeb8b253d47e72f7bfe508804b16 Mon Sep 17 00:00:00 2001
From: Xiaochen Shen <xiaochen.shen(a)intel.com>
Date: Sat, 31 Oct 2020 03:11:28 +0800
Subject: [PATCH] x86/resctrl: Add necessary kernfs_put() calls to prevent
refcount leak
On resource group creation via a mkdir an extra kernfs_node reference is
obtained by kernfs_get() to ensure that the rdtgroup structure remains
accessible for the rdtgroup_kn_unlock() calls where it is removed on
deletion. Currently the extra kernfs_node reference count is only
dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup
structure is removed in a few other locations that lack the matching
reference drop.
In call paths of rmdir and umount, when a control group is removed,
kernfs_remove() is called to remove the whole kernfs nodes tree of the
control group (including the kernfs nodes trees of all child monitoring
groups), and then rdtgroup structure is freed by kfree(). The rdtgroup
structures of all child monitoring groups under the control group are
freed by kfree() in free_all_child_rdtgrp().
Before calling kfree() to free the rdtgroup structures, the kernfs node
of the control group itself as well as the kernfs nodes of all child
monitoring groups still take the extra references which will never be
dropped to 0 and the kernfs nodes will never be freed. It leads to
reference count leak and kernfs_node_cache memory leak.
For example, reference count leak is observed in these two cases:
(1) mount -t resctrl resctrl /sys/fs/resctrl
mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
umount /sys/fs/resctrl
(2) mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
rmdir /sys/fs/resctrl/c1
The same reference count leak issue also exists in the error exit paths
of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon().
Fix this issue by following changes to make sure the extra kernfs_node
reference on rdtgroup is dropped before freeing the rdtgroup structure.
(1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up
kernfs_put() and kfree().
(2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup
structure is about to be freed by kfree().
(3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error
exit paths of mkdir where an extra reference is taken by kernfs_get().
Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Reported-by: Willem de Bruijn <willemb(a)google.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@i…
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 2ab1266a5f14..6f4ca4bea625 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -507,6 +507,24 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+/**
+ * rdtgroup_remove - the helper to remove resource group safely
+ * @rdtgrp: resource group to remove
+ *
+ * On resource group creation via a mkdir, an extra kernfs_node reference is
+ * taken to ensure that the rdtgroup structure remains accessible for the
+ * rdtgroup_kn_unlock() calls where it is removed.
+ *
+ * Drop the extra reference here, then free the rdtgroup structure.
+ *
+ * Return: void
+ */
+static void rdtgroup_remove(struct rdtgroup *rdtgrp)
+{
+ kernfs_put(rdtgrp->kn);
+ kfree(rdtgrp);
+}
+
struct task_move_callback {
struct callback_head work;
struct rdtgroup *rdtgrp;
@@ -529,7 +547,7 @@ static void move_myself(struct callback_head *head)
(rdtgrp->flags & RDT_DELETED)) {
current->closid = 0;
current->rmid = 0;
- kfree(rdtgrp);
+ rdtgroup_remove(rdtgrp);
}
if (unlikely(current->flags & PF_EXITING))
@@ -2065,8 +2083,7 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
rdtgroup_pseudo_lock_remove(rdtgrp);
kernfs_unbreak_active_protection(kn);
- kernfs_put(rdtgrp->kn);
- kfree(rdtgrp);
+ rdtgroup_remove(rdtgrp);
} else {
kernfs_unbreak_active_protection(kn);
}
@@ -2341,7 +2358,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
if (atomic_read(&sentry->waitcount) != 0)
sentry->flags = RDT_DELETED;
else
- kfree(sentry);
+ rdtgroup_remove(sentry);
}
}
@@ -2383,7 +2400,7 @@ static void rmdir_all_sub(void)
if (atomic_read(&rdtgrp->waitcount) != 0)
rdtgrp->flags = RDT_DELETED;
else
- kfree(rdtgrp);
+ rdtgroup_remove(rdtgrp);
}
/* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */
update_closid_rmid(cpu_online_mask, &rdtgroup_default);
@@ -2818,7 +2835,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
* kernfs_remove() will drop the reference count on "kn" which
* will free it. But we still need it to stick around for the
* rdtgroup_kn_unlock(kn) call. Take one extra reference here,
- * which will be dropped inside rdtgroup_kn_unlock().
+ * which will be dropped by kernfs_put() in rdtgroup_remove().
*/
kernfs_get(kn);
@@ -2859,6 +2876,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
out_idfree:
free_rmid(rdtgrp->mon.rmid);
out_destroy:
+ kernfs_put(rdtgrp->kn);
kernfs_remove(rdtgrp->kn);
out_free_rgrp:
kfree(rdtgrp);
@@ -2871,7 +2889,7 @@ static void mkdir_rdt_prepare_clean(struct rdtgroup *rgrp)
{
kernfs_remove(rgrp->kn);
free_rmid(rgrp->mon.rmid);
- kfree(rgrp);
+ rdtgroup_remove(rgrp);
}
/*
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4b639e254d3d4f15ee4ff2b890a447204cfbeea9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= <mirq-linux(a)rere.qmqm.pl>
Date: Fri, 13 Nov 2020 01:20:28 +0100
Subject: [PATCH] regulator: avoid resolve_supply() infinite recursion
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When a regulator's name equals its supply's name the
regulator_resolve_supply() recurses indefinitely. Add a check
so that debugging the problem is easier. The "fixed" commit
just exposed the problem.
Fixes: aea6cb99703e ("regulator: resolve supply after creating regulator")
Cc: stable(a)vger.kernel.org
Reported-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de>
Signed-off-by: Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
Tested-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de> # stpmic1
Link: https://lore.kernel.org/r/c6171057cfc0896f950c4d8cb82df0f9f1b89ad9.16052266…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 84dd0aea97c5..d3b96efa20fd 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1841,6 +1841,12 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
}
}
+ if (r == rdev) {
+ dev_err(dev, "Supply for %s (%s) resolved to itself\n",
+ rdev->desc->name, rdev->supply_name);
+ return -EINVAL;
+ }
+
/*
* If the supply's parent device is not the same as the
* regulator's parent device, then ensure the parent device
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 758999246965eeb8b253d47e72f7bfe508804b16 Mon Sep 17 00:00:00 2001
From: Xiaochen Shen <xiaochen.shen(a)intel.com>
Date: Sat, 31 Oct 2020 03:11:28 +0800
Subject: [PATCH] x86/resctrl: Add necessary kernfs_put() calls to prevent
refcount leak
On resource group creation via a mkdir an extra kernfs_node reference is
obtained by kernfs_get() to ensure that the rdtgroup structure remains
accessible for the rdtgroup_kn_unlock() calls where it is removed on
deletion. Currently the extra kernfs_node reference count is only
dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup
structure is removed in a few other locations that lack the matching
reference drop.
In call paths of rmdir and umount, when a control group is removed,
kernfs_remove() is called to remove the whole kernfs nodes tree of the
control group (including the kernfs nodes trees of all child monitoring
groups), and then rdtgroup structure is freed by kfree(). The rdtgroup
structures of all child monitoring groups under the control group are
freed by kfree() in free_all_child_rdtgrp().
Before calling kfree() to free the rdtgroup structures, the kernfs node
of the control group itself as well as the kernfs nodes of all child
monitoring groups still take the extra references which will never be
dropped to 0 and the kernfs nodes will never be freed. It leads to
reference count leak and kernfs_node_cache memory leak.
For example, reference count leak is observed in these two cases:
(1) mount -t resctrl resctrl /sys/fs/resctrl
mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
umount /sys/fs/resctrl
(2) mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
rmdir /sys/fs/resctrl/c1
The same reference count leak issue also exists in the error exit paths
of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon().
Fix this issue by following changes to make sure the extra kernfs_node
reference on rdtgroup is dropped before freeing the rdtgroup structure.
(1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up
kernfs_put() and kfree().
(2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup
structure is about to be freed by kfree().
(3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error
exit paths of mkdir where an extra reference is taken by kernfs_get().
Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Reported-by: Willem de Bruijn <willemb(a)google.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@i…
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 2ab1266a5f14..6f4ca4bea625 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -507,6 +507,24 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+/**
+ * rdtgroup_remove - the helper to remove resource group safely
+ * @rdtgrp: resource group to remove
+ *
+ * On resource group creation via a mkdir, an extra kernfs_node reference is
+ * taken to ensure that the rdtgroup structure remains accessible for the
+ * rdtgroup_kn_unlock() calls where it is removed.
+ *
+ * Drop the extra reference here, then free the rdtgroup structure.
+ *
+ * Return: void
+ */
+static void rdtgroup_remove(struct rdtgroup *rdtgrp)
+{
+ kernfs_put(rdtgrp->kn);
+ kfree(rdtgrp);
+}
+
struct task_move_callback {
struct callback_head work;
struct rdtgroup *rdtgrp;
@@ -529,7 +547,7 @@ static void move_myself(struct callback_head *head)
(rdtgrp->flags & RDT_DELETED)) {
current->closid = 0;
current->rmid = 0;
- kfree(rdtgrp);
+ rdtgroup_remove(rdtgrp);
}
if (unlikely(current->flags & PF_EXITING))
@@ -2065,8 +2083,7 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
rdtgroup_pseudo_lock_remove(rdtgrp);
kernfs_unbreak_active_protection(kn);
- kernfs_put(rdtgrp->kn);
- kfree(rdtgrp);
+ rdtgroup_remove(rdtgrp);
} else {
kernfs_unbreak_active_protection(kn);
}
@@ -2341,7 +2358,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
if (atomic_read(&sentry->waitcount) != 0)
sentry->flags = RDT_DELETED;
else
- kfree(sentry);
+ rdtgroup_remove(sentry);
}
}
@@ -2383,7 +2400,7 @@ static void rmdir_all_sub(void)
if (atomic_read(&rdtgrp->waitcount) != 0)
rdtgrp->flags = RDT_DELETED;
else
- kfree(rdtgrp);
+ rdtgroup_remove(rdtgrp);
}
/* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */
update_closid_rmid(cpu_online_mask, &rdtgroup_default);
@@ -2818,7 +2835,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
* kernfs_remove() will drop the reference count on "kn" which
* will free it. But we still need it to stick around for the
* rdtgroup_kn_unlock(kn) call. Take one extra reference here,
- * which will be dropped inside rdtgroup_kn_unlock().
+ * which will be dropped by kernfs_put() in rdtgroup_remove().
*/
kernfs_get(kn);
@@ -2859,6 +2876,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
out_idfree:
free_rmid(rdtgrp->mon.rmid);
out_destroy:
+ kernfs_put(rdtgrp->kn);
kernfs_remove(rdtgrp->kn);
out_free_rgrp:
kfree(rdtgrp);
@@ -2871,7 +2889,7 @@ static void mkdir_rdt_prepare_clean(struct rdtgroup *rgrp)
{
kernfs_remove(rgrp->kn);
free_rmid(rgrp->mon.rmid);
- kfree(rgrp);
+ rdtgroup_remove(rgrp);
}
/*
Commit a408e4a86b36b ("ima: open a new file instance if no read
permissions") already introduced a second open to measure a file when the
original file descriptor does not allow it. However, it didn't remove the
existing method of changing the mode of the original file descriptor, which
is still necessary if the current process does not have enough privileges
to open a new one.
Changing the mode isn't really an option, as the filesystem might need to
do preliminary steps to make the read possible. Thus, this patch removes
the code and keeps the second open as the only option to measure a file
when it is unreadable with the original file descriptor.
Cc: <stable(a)vger.kernel.org> # 4.20.x: 0014cc04e8ec0 ima: Set file->f_mode
Cc: <stable(a)vger.kernel.org> # 4.20.x
Fixes: 2fe5d6def1672 ("ima: integrity appraisal extension")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
security/integrity/ima/ima_crypto.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c
index 21989fa0c107..f6a7e9643b54 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -537,7 +537,7 @@ int ima_calc_file_hash(struct file *file, struct ima_digest_data *hash)
loff_t i_size;
int rc;
struct file *f = file;
- bool new_file_instance = false, modified_mode = false;
+ bool new_file_instance = false;
/*
* For consistency, fail file's opened with the O_DIRECT flag on
@@ -555,18 +555,10 @@ int ima_calc_file_hash(struct file *file, struct ima_digest_data *hash)
O_TRUNC | O_CREAT | O_NOCTTY | O_EXCL);
flags |= O_RDONLY;
f = dentry_open(&file->f_path, flags, file->f_cred);
- if (IS_ERR(f)) {
- /*
- * Cannot open the file again, lets modify f_mode
- * of original and continue
- */
- pr_info_ratelimited("Unable to reopen file for reading.\n");
- f = file;
- f->f_mode |= FMODE_READ;
- modified_mode = true;
- } else {
- new_file_instance = true;
- }
+ if (IS_ERR(f))
+ return PTR_ERR(f);
+
+ new_file_instance = true;
}
i_size = i_size_read(file_inode(f));
@@ -581,8 +573,6 @@ int ima_calc_file_hash(struct file *file, struct ima_digest_data *hash)
out:
if (new_file_instance)
fput(f);
- else if (modified_mode)
- f->f_mode &= ~FMODE_READ;
return rc;
}
--
2.27.GIT
Commit 0f0581b24bd0 ("spi: fsl: Convert to use CS GPIO descriptors")
broke the use of the SPISEL_BOOT signal as a chip select on the
MPC8309.
pdata->max_chipselect, which becomes master->num_chipselect, must be
initialized to take into account the possibility that there's one more
chip select in use than the number of GPIO chip selects.
Cc: stable(a)vger.kernel.org # v5.4+
Cc: Christophe Leroy <christophe.leroy(a)c-s.fr>
Cc: Linus Walleij <linus.walleij(a)linaro.org>
Fixes: 0f0581b24bd0 ("spi: fsl: Convert to use CS GPIO descriptors")
Signed-off-by: Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
---
Longer-term, it might be nicer to introduce a very trivial "gpiochip"
with nr_gpios=1, which doesn't really implement neither the
"general-purpose" nor the "input" part of the gpio acronym. That's how
this ended up being handled in U-Boot, for example:
https://github.com/u-boot/u-boot/commit/3fb22bc2f825ea1b3326edc5b32d711a759…
But for -stable, this is much simpler.
drivers/spi/spi-fsl-spi.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 299e9870cf58..9494257e1c33 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -716,10 +716,11 @@ static int of_fsl_spi_probe(struct platform_device *ofdev)
type = fsl_spi_get_type(&ofdev->dev);
if (type == TYPE_FSL) {
struct fsl_spi_platform_data *pdata = dev_get_platdata(dev);
+ bool spisel_boot = false;
#if IS_ENABLED(CONFIG_FSL_SOC)
struct mpc8xxx_spi_probe_info *pinfo = to_of_pinfo(pdata);
- bool spisel_boot = of_property_read_bool(np, "fsl,spisel_boot");
+ spisel_boot = of_property_read_bool(np, "fsl,spisel_boot");
if (spisel_boot) {
pinfo->immr_spi_cs = ioremap(get_immrbase() + IMMR_SPI_CS_OFFSET, 4);
if (!pinfo->immr_spi_cs)
@@ -734,10 +735,14 @@ static int of_fsl_spi_probe(struct platform_device *ofdev)
* supported on the GRLIB variant.
*/
ret = gpiod_count(dev, "cs");
- if (ret <= 0)
+ if (ret < 0)
+ ret = 0;
+ if (ret == 0 && !spisel_boot) {
pdata->max_chipselect = 1;
- else
+ } else {
+ pdata->max_chipselect = ret + spisel_boot;
pdata->cs_control = fsl_spi_cs_control;
+ }
}
ret = of_address_to_resource(np, 0, &mem);
--
2.23.0
FW has to configure devices' StreamIDs so that SMMU is able to lookup
context and do proper translation later on. For Armada 7040 & 8040 and
publicly available FW, most of the devices are configured properly,
but some like ap_sdhci0, PCIe, NIC still remain unassigned which
results in SMMU faults about unmatched StreamID (assuming
ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y).
Since there is dependency on custom FW let SMMU be disabled by default.
People who still willing to use SMMU need to enable manually and
use ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=n (or via kernel command line)
with extra caution.
Fixes: 83a3545d9c37 ("arm64: dts: marvell: add SMMU support")
Cc: <stable(a)vger.kernel.org> # 5.9+
Signed-off-by: Tomasz Nowicki <tn(a)semihalf.com>
---
arch/arm64/boot/dts/marvell/armada-7040.dtsi | 4 ----
arch/arm64/boot/dts/marvell/armada-8040.dtsi | 4 ----
2 files changed, 8 deletions(-)
diff --git a/arch/arm64/boot/dts/marvell/armada-7040.dtsi b/arch/arm64/boot/dts/marvell/armada-7040.dtsi
index 7a3198cd7a07..2f440711d21d 100644
--- a/arch/arm64/boot/dts/marvell/armada-7040.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-7040.dtsi
@@ -15,10 +15,6 @@ / {
"marvell,armada-ap806";
};
-&smmu {
- status = "okay";
-};
-
&cp0_pcie0 {
iommu-map =
<0x0 &smmu 0x480 0x20>,
diff --git a/arch/arm64/boot/dts/marvell/armada-8040.dtsi b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
index 79e8ce59baa8..22c2d6ebf381 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-8040.dtsi
@@ -15,10 +15,6 @@ / {
"marvell,armada-ap806";
};
-&smmu {
- status = "okay";
-};
-
&cp0_pcie0 {
iommu-map =
<0x0 &smmu 0x480 0x20>,
--
2.25.1
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: bb4c6910c8b41623104c2e64a30615682689a54d
Gitweb: https://git.kernel.org/tip/bb4c6910c8b41623104c2e64a30615682689a54d
Author: Laurent Vivier <lvivier(a)redhat.com>
AuthorDate: Thu, 26 Nov 2020 09:28:51 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 30 Nov 2020 12:21:31 +01:00
genirq/irqdomain: Add an irq_create_mapping_affinity() function
There is currently no way to convey the affinity of an interrupt
via irq_create_mapping(), which creates issues for devices that
expect that affinity to be managed by the kernel.
In order to sort this out, rename irq_create_mapping() to
irq_create_mapping_affinity() with an additional affinity parameter that
can be passed down to irq_domain_alloc_descs().
irq_create_mapping() is re-implemented as a wrapper around
irq_create_mapping_affinity().
No functional change.
Fixes: e75eafb9b039 ("genirq/msi: Switch to new irq spreading infrastructure")
Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kurz <groug(a)kaod.org>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20201126082852.1178497-2-lvivier@redhat.com
---
include/linux/irqdomain.h | 12 ++++++++++--
kernel/irq/irqdomain.c | 13 ++++++++-----
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 71535e8..ea5a337 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain *domain,
extern void irq_domain_disassociate(struct irq_domain *domain,
unsigned int irq);
-extern unsigned int irq_create_mapping(struct irq_domain *host,
- irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
+ irq_hw_number_t hwirq,
+ const struct irq_affinity_desc *affinity);
extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
extern void irq_dispose_mapping(unsigned int virq);
+static inline unsigned int irq_create_mapping(struct irq_domain *host,
+ irq_hw_number_t hwirq)
+{
+ return irq_create_mapping_affinity(host, hwirq, NULL);
+}
+
+
/**
* irq_linear_revmap() - Find a linux irq from a hw irq number.
* @domain: domain owning this hardware interrupt
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374..e4ca696 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain)
EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
/**
- * irq_create_mapping() - Map a hardware interrupt into linux irq space
+ * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq space
* @domain: domain owning this hardware interrupt or NULL for default domain
* @hwirq: hardware irq number in that domain space
+ * @affinity: irq affinity
*
* Only one mapping per hardware interrupt is permitted. Returns a linux
* irq number.
* If the sense/trigger is to be specified, set_irq_type() should be called
* on the number returned from that call.
*/
-unsigned int irq_create_mapping(struct irq_domain *domain,
- irq_hw_number_t hwirq)
+unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
+ irq_hw_number_t hwirq,
+ const struct irq_affinity_desc *affinity)
{
struct device_node *of_node;
int virq;
@@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
}
/* Allocate a virtual interrupt number */
- virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL);
+ virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
+ affinity);
if (virq <= 0) {
pr_debug("-> virq allocation failed\n");
return 0;
@@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
return virq;
}
-EXPORT_SYMBOL_GPL(irq_create_mapping);
+EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
/**
* irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 9ea69a55b3b9a71cded9726af591949c1138f235
Gitweb: https://git.kernel.org/tip/9ea69a55b3b9a71cded9726af591949c1138f235
Author: Laurent Vivier <lvivier(a)redhat.com>
AuthorDate: Thu, 26 Nov 2020 09:28:52 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 30 Nov 2020 12:22:04 +01:00
powerpc/pseries: Pass MSI affinity to irq_create_mapping()
With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
Commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") exposed
an existing shortcoming of the arch code by moving virtio_scsi to
the automatic IRQ affinity assignment.
The affinity is correctly computed in msi_desc but this is not applied
to the system IRQs.
It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
lost at this point and never passed to irq_domain_alloc_descs()
(see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
because irq_create_mapping() doesn't take an affinity parameter.
Use the new irq_create_mapping_affinity() function, which allows to forward
the affinity setting from rtas_setup_msi_irqs() to irq_domain_alloc_descs().
With this change, the virtqueues are correctly dispatched between the CPUs
on pseries.
Fixes: e75eafb9b039 ("genirq/msi: Switch to new irq spreading infrastructure")
Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kurz <groug(a)kaod.org>
Acked-by: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20201126082852.1178497-3-lvivier@redhat.com
---
arch/powerpc/platforms/pseries/msi.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 133f6ad..b3ac245 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,7 +458,8 @@ again:
return hwirq;
}
- virq = irq_create_mapping(NULL, hwirq);
+ virq = irq_create_mapping_affinity(NULL, hwirq,
+ entry->affinity);
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
In commit d8bb6718c4db ("arm64: Make debug exception handlers visible
from RCU") debug_exception_enter and exit were added to deal with better
tracking of RCU state - however, in addition to that, but not mentioned
in the commit log, a preempt_disable/enable pair were also added.
Based on the comment (being removed here) it would seem that the pair
were not added to address a specific problem, but just as a proactive,
preventative measure - as in "seemed like a good idea at the time".
The problem is that do_debug_exception() eventually calls out to
generic kernel code like do_force_sig_info() which takes non-raw locks
and on -rt enabled kernels, results in things looking like the following,
since on -rt kernels, it is noticed that preemption is still disabled.
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
Preemption disabled at:
[<ffff000010081578>] do_debug_exception+0x38/0x1a4
Call trace:
dump_backtrace+0x0/0x138
show_stack+0x24/0x30
dump_stack+0x94/0xbc
___might_sleep+0x13c/0x168
rt_spin_lock+0x40/0x80
do_force_sig_info+0x30/0xe0
force_sig_fault+0x64/0x90
arm64_force_sig_fault+0x50/0x80
send_user_sigtrap+0x50/0x80
brk_handler+0x98/0xc8
do_debug_exception+0x70/0x1a4
el0_dbg+0x18/0x20
The reproducer was basically an automated gdb test that set a breakpoint
on a simple "hello world" program and then quit gdb once the breakpoint
was hit - i.e. "(gdb) A debugging session is active. Quit anyway? "
Fixes: d8bb6718c4db ("arm64: Make debug exception handlers visible from RCU")
Cc: stable(a)vger.kernel.org
Cc: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Cc: Paul E. McKenney <paulmck(a)linux.ibm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Will Deacon <will(a)kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker(a)windriver.com>
---
arch/arm64/mm/fault.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8afb238ff335..4d83ec991b33 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -788,13 +788,6 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name = name;
}
-/*
- * In debug exception context, we explicitly disable preemption despite
- * having interrupts disabled.
- * This serves two purposes: it makes it much less likely that we would
- * accidentally schedule in exception context and it will force a warning
- * if we somehow manage to schedule by accident.
- */
static void debug_exception_enter(struct pt_regs *regs)
{
/*
@@ -816,8 +809,6 @@ static void debug_exception_enter(struct pt_regs *regs)
rcu_nmi_enter();
}
- preempt_disable();
-
/* This code is a bit fragile. Test it. */
RCU_LOCKDEP_WARN(!rcu_is_watching(), "exception_enter didn't work");
}
@@ -825,8 +816,6 @@ NOKPROBE_SYMBOL(debug_exception_enter);
static void debug_exception_exit(struct pt_regs *regs)
{
- preempt_enable_no_resched();
-
if (!user_mode(regs))
rcu_nmi_exit();
--
2.7.4
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 184eead057cc7e803558269babc1f2cfb9113ad1 Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Thu, 19 Nov 2020 12:00:40 -0500
Subject: [PATCH] USB: core: Fix regression in Hercules audio card
Commit 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
aimed to make the USB stack more reliable by detecting and skipping
over endpoints that are duplicated between interfaces. This caused a
regression for a Hercules audio card (reported as Bugzilla #208357),
which contains such non-compliant duplications. Although the
duplications are harmless, skipping the valid endpoints prevented the
device from working.
This patch fixes the regression by adding ENDPOINT_IGNORE quirks for
the Hercules card, telling the kernel to ignore the invalid duplicate
endpoints and thereby allowing the valid endpoints to be used as
intended.
Fixes: 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
CC: <stable(a)vger.kernel.org>
Reported-by: Alexander Chalikiopoulos <bugzilla.kernel.org(a)mrtoasted.com>
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Link: https://lore.kernel.org/r/20201119170040.GA576844@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index f536ea9fe945..fad31ccd1fa8 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -348,6 +348,10 @@ static const struct usb_device_id usb_quirk_list[] = {
/* Guillemot Webcam Hercules Dualpix Exchange*/
{ USB_DEVICE(0x06f8, 0x3005), .driver_info = USB_QUIRK_RESET_RESUME },
+ /* Guillemot Hercules DJ Console audio card (BZ 208357) */
+ { USB_DEVICE(0x06f8, 0xb000), .driver_info =
+ USB_QUIRK_ENDPOINT_IGNORE },
+
/* Midiman M-Audio Keystation 88es */
{ USB_DEVICE(0x0763, 0x0192), .driver_info = USB_QUIRK_RESET_RESUME },
@@ -525,6 +529,8 @@ static const struct usb_device_id usb_amd_resume_quirk_list[] = {
* Matched for devices with USB_QUIRK_ENDPOINT_IGNORE.
*/
static const struct usb_device_id usb_endpoint_ignore[] = {
+ { USB_DEVICE_INTERFACE_NUMBER(0x06f8, 0xb000, 5), .driver_info = 0x01 },
+ { USB_DEVICE_INTERFACE_NUMBER(0x06f8, 0xb000, 5), .driver_info = 0x81 },
{ USB_DEVICE_INTERFACE_NUMBER(0x0926, 0x0202, 1), .driver_info = 0x85 },
{ USB_DEVICE_INTERFACE_NUMBER(0x0926, 0x0208, 1), .driver_info = 0x85 },
{ }
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e7694cb6998379341fd9bf3bd62b48c4e6a79385 Mon Sep 17 00:00:00 2001
From: Zhang Qilong <zhangqilong3(a)huawei.com>
Date: Tue, 17 Nov 2020 10:16:28 +0800
Subject: [PATCH] usb: gadget: f_midi: Fix memleak in f_midi_alloc
In the error path, if midi is not null, we should
free the midi->id if necessary to prevent memleak.
Fixes: b85e9de9e818d ("usb: gadget: f_midi: convert to new function interface with backward compatibility")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Zhang Qilong <zhangqilong3(a)huawei.com>
Link: https://lore.kernel.org/r/20201117021629.1470544-2-zhangqilong3@huawei.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
index 85cb15734aa8..19d97940eeb9 100644
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c
@@ -1315,7 +1315,7 @@ static struct usb_function *f_midi_alloc(struct usb_function_instance *fi)
midi->id = kstrdup(opts->id, GFP_KERNEL);
if (opts->id && !midi->id) {
status = -ENOMEM;
- goto setup_fail;
+ goto midi_free;
}
midi->in_ports = opts->in_ports;
midi->out_ports = opts->out_ports;
@@ -1327,7 +1327,7 @@ static struct usb_function *f_midi_alloc(struct usb_function_instance *fi)
status = kfifo_alloc(&midi->in_req_fifo, midi->qlen, GFP_KERNEL);
if (status)
- goto setup_fail;
+ goto midi_free;
spin_lock_init(&midi->transmit_lock);
@@ -1343,9 +1343,13 @@ static struct usb_function *f_midi_alloc(struct usb_function_instance *fi)
return &midi->func;
+midi_free:
+ if (midi)
+ kfree(midi->id);
+ kfree(midi);
setup_fail:
mutex_unlock(&opts->lock);
- kfree(midi);
+
return ERR_PTR(status);
}
Hi Greg and Sasha,
Please apply commit d853b3406903 ("spi: bcm2835aux: Restore err
assignment in bcm2835aux_spi_probe") to linux-5.4.y and linux-5.9.y as a
fix for commit e13ee6cc4781 ("spi: bcm2835aux: Fix use-after-free on
unbind"). I did not realize that commit was tagged for stable so I did
not tag my fix accordingly, sorry for not noticing sooner.
Cheers,
Nathan
With the current implementation the following race can happen:
* blk_pre_runtime_suspend() calls blk_freeze_queue_start() and
blk_mq_unfreeze_queue().
* blk_queue_enter() calls blk_queue_pm_only() and that function returns
true.
* blk_queue_enter() calls blk_pm_request_resume() and that function does
not call pm_request_resume() because the queue runtime status is
RPM_ACTIVE.
* blk_pre_runtime_suspend() changes the queue status into RPM_SUSPENDING.
Fix this race by changing the queue runtime status into RPM_SUSPENDING
before switching q_usage_counter to atomic mode.
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Acked-by: Stanley Chu <stanley.chu(a)mediatek.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Fixes: 986d413b7c15 ("blk-mq: Enable support for runtime power management")
Signed-off-by: Can Guo <cang(a)codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
block/blk-pm.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/block/blk-pm.c b/block/blk-pm.c
index b85234d758f7..17bd020268d4 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -67,6 +67,10 @@ int blk_pre_runtime_suspend(struct request_queue *q)
WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
+ spin_lock_irq(&q->queue_lock);
+ q->rpm_status = RPM_SUSPENDING;
+ spin_unlock_irq(&q->queue_lock);
+
/*
* Increase the pm_only counter before checking whether any
* non-PM blk_queue_enter() calls are in progress to avoid that any
@@ -89,15 +93,14 @@ int blk_pre_runtime_suspend(struct request_queue *q)
/* Switch q_usage_counter back to per-cpu mode. */
blk_mq_unfreeze_queue(q);
- spin_lock_irq(&q->queue_lock);
- if (ret < 0)
+ if (ret < 0) {
+ spin_lock_irq(&q->queue_lock);
+ q->rpm_status = RPM_ACTIVE;
pm_runtime_mark_last_busy(q->dev);
- else
- q->rpm_status = RPM_SUSPENDING;
- spin_unlock_irq(&q->queue_lock);
+ spin_unlock_irq(&q->queue_lock);
- if (ret)
blk_clear_pm_only(q);
+ }
return ret;
}
This deadlock is hitting Android users (Pixel 3/3a/4) with Magisk, due
to frequent umount/mount operations that trigger quota_sync, hitting
the race. See https://github.com/topjohnwu/Magisk/issues/3171 for
additional impact discussion.
In commit db6ec53b7e03, we added a semaphore to protect quota flags.
As part of this commit, we changed f2fs_quota_sync to call
f2fs_lock_op, in an attempt to prevent an AB/BA type deadlock with
quota_sem locking in block_operation. However, rwsem in Linux is not
recursive. Therefore, the following deadlock can occur:
f2fs_quota_sync
down_read(cp_rwsem) // f2fs_lock_op
filemap_fdatawrite
f2fs_write_data_pages
...
block_opertaion
down_write(cp_rwsem) - marks rwsem as
"writer pending"
down_read_trylock(cp_rwsem) - fails as there is
a writer pending.
Code keeps on trying,
live-locking the filesystem.
We solve this by creating a new rwsem, used specifically to
synchronize this case, instead of attempting to reuse an existing
lock.
Signed-off-by: Shachar Raindel <shacharr(a)gmail.com>
Fixes: db6ec53b7e03 f2fs: add a rw_sem to cover quota flag changes
---
fs/f2fs/f2fs.h | 3 +++
fs/f2fs/super.c | 13 +++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cb700d797296..b3e55137be7f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1448,6 +1448,7 @@ struct f2fs_sb_info {
struct inode *meta_inode; /* cache meta blocks */
struct mutex cp_mutex; /* checkpoint procedure lock */
struct rw_semaphore cp_rwsem; /* blocking FS operations */
+ struct rw_semaphore cp_quota_rwsem; /* blocking quota sync operations */
struct rw_semaphore node_write; /* locking node writes */
struct rw_semaphore node_change; /* locking node change */
wait_queue_head_t cp_wait;
@@ -1961,12 +1962,14 @@ static inline void f2fs_unlock_op(struct f2fs_sb_info *sbi)
static inline void f2fs_lock_all(struct f2fs_sb_info *sbi)
{
+ down_write(&sbi->cp_quota_rwsem);
down_write(&sbi->cp_rwsem);
}
static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi)
{
up_write(&sbi->cp_rwsem);
+ up_write(&sbi->cp_quota_rwsem);
}
static inline int __get_cp_reason(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 00eff2f51807..5ce61147d7e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2209,8 +2209,16 @@ int f2fs_quota_sync(struct super_block *sb, int type)
* f2fs_dquot_commit
* block_operation
* down_read(quota_sem)
+ *
+ * However, we cannot use the cp_rwsem to prevent this
+ * deadlock, as the cp_rwsem is taken for read inside the
+ * f2fs_dquot_commit code, and rwsem is not recursive.
+ *
+ * We therefore use a special lock to synchronize
+ * f2fs_quota_sync with block_operations, as this is the only
+ * place where such recursion occurs.
*/
- f2fs_lock_op(sbi);
+ down_read(&sbi->cp_quota_rwsem);
down_read(&sbi->quota_sem);
ret = dquot_writeback_dquots(sb, type);
@@ -2251,7 +2259,7 @@ int f2fs_quota_sync(struct super_block *sb, int type)
if (ret)
set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
up_read(&sbi->quota_sem);
- f2fs_unlock_op(sbi);
+ up_read(&sbi->cp_quota_rwsem);
return ret;
}
@@ -3599,6 +3607,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->cp_rwsem);
init_rwsem(&sbi->quota_sem);
+ init_rwsem(&sbi->cp_quota_rwsem);
init_waitqueue_head(&sbi->cp_wait);
init_sb_info(sbi);
--
2.29.2
On Sat, Nov 28, 2020 at 11:28:53PM +0800, Wen Yang wrote:
>
>
> 在 2020/11/28 下午10:05, Greg Kroah-Hartman 写道:
> > On Sat, Nov 28, 2020 at 09:59:09PM +0800, Wen Yang wrote:
> > >
> > >
> > > 在 2020/11/28 下午4:06, Greg Kroah-Hartman 写道:
> > > > On Sat, Nov 28, 2020 at 02:47:22PM +0800, Wen Yang wrote:
> > > > > [ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]
> > > >
> > > > No, that is not this commit at all.
> > > >
> > > > What are you wanting to have happen here?
> > > >
> > > > confused,
> > > >
> > > > greg k-h
> > > >
> > >
> > > Thanks.
> > > Let's explain it briefly:
> > >
> > > The dentries such as /proc/<pid>/ns/ipc have the DCACHE_OP_DELETE flag, they
> > > should be deleted when the process exits.
> > > Suppose the following race appears:
> > >
> > > release_task dput
> > > -> proc_flush_task
> > > -> dentry->d_op->d_delete(dentry)
> > > -> __exit_signal
> > > -> dentry->d_lockref.count-- and return.
> > >
> > >
> > > In the proc_flush_task function, because another processe is using this
> > > dentry, it cannot be deleted;
> > > In the dput function, d_delete may be executed before __exit_signal (the pid
> > > has not been unhashed), so that d_delete returns false and the dentry can
> > > not be deleted.
> > >
> > > So this dentry is still caches (count is 0), and its parent dentries are
> > > also caches, and those dentries can only be deleted when drop_caches is
> > > manually triggered.
> > >
> > >
> > > In the release_task function, we should move proc_flush_task after the
> > > tasklist_lock is released(Just like the commit
> > > 7bc3e6e55acf065500a24621f3b313e7e5998acf did).
> >
> > I do not understand, is this a patch being submitted for the main kernel
> > tree, or for a stable kernel release?
> >
> > If stable, please read:
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> >
> > If main kernel tree, you can't have the "Upstream commit" line in the
> > changelog text as that makes no sense at all.
>
>
> Hi,
> This patch is submitted to the stable branches (from 4.9.y
> to 5.6.y).
>
> This problem can also be solved if the following patch could be ported to
> the stable branch:
> 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
> 26dbc60f385f ("proc: Generalize proc_sys_prune_dcache into
> proc_prune_siblings_dcache")
> f90f3cafe8d5 ("proc: Use d_invalidate in proc_prune_siblings_dcache")
>
> However, the above-mentioned patches modify too much code (more than 100
> lines), and there may also be some undiscovered bugs.
>
> So the safer method may be to apply this small patch(also ported from the
> equivalent fix already exist in Linus’ tree).
>
> We will reformat the patch later.
We always prefer to take the original, upstream patches, instead of
one-off changes as almost always, those one-off changes end up being
wrong and hard to work with over time.
So if we need more than one patch to solve this reported problem, that's
fine, can you test the above series of patches and provide a backported
set of them that we can use for this?
thanks,
gre gk-h
Hi Christian,
On Sat, Nov 28, 2020 at 3:11 PM Christian Eggers <ceggers(a)arri.de> wrote:
>
> Hi Greg, hi Sudip, hi Wolfram,
>
> this patch is part of my series :
>
<snip>
>
> Although I am happy seeing my patch in the stable series, I think it should be
> applied to mainline first.
Stable trees can only take fixes which have been applied to the
mainline. This is in mainline since v5.9.
--
Regards
Sudip
From: Chris Wilson <chris(a)chris-wilson.co.uk>
commit be33805c65297611971003d72e7f9235e23ec84d upstream.
Forcing mocs:1 [used for our winsys follows-pte mode] to be cached
caused display glitches. Though it is documented as deprecated (and so
likely behaves as uncached) use the follow-pte bit and force it out of
L3 cache.
Testcase: igt/kms_frontbuffer_tracking
Testcase: igt/kms_big_fb
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Ayaz A Siddiqui <ayaz.siddiqui(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: Matt Roper <matthew.d.roper(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@…
(cherry picked from commit a04ac827366594c7244f60e9be79fcb404af69f0)
Fixes: 849c0fe9e831 ("drm/i915/gt: Initialize reserved and unspecified MOCS indices")
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
[Rodrigo: Updated Fixes tag]
Cc: <stable(a)vger.kernel.org> # 5.9.x
---
drivers/gpu/drm/i915/gt/intel_mocs.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index b8f56e62158e..313e51e7d4f7 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -243,8 +243,9 @@ static const struct drm_i915_mocs_entry tgl_mocs_table[] = {
* only, __init_mocs_table() take care to program unused index with
* this entry.
*/
- MOCS_ENTRY(1, LE_3_WB | LE_TC_1_LLC | LE_LRUM(3),
- L3_3_WB),
+ MOCS_ENTRY(I915_MOCS_PTE,
+ LE_0_PAGETABLE | LE_TC_0_PAGETABLE,
+ L3_1_UC),
GEN11_MOCS_ENTRIES,
/* Implicitly enable L1 - HDC:L1 + L3 + LLC */
--
2.28.0
f2fs_quota_sync is calling f2fs_lock_op, in an attempt to prevent
an AB/BA type deadlock with quota_sem locking in block_operation.
However, rwsem in Linux is not recursive. As a result, the following
deadlock may occur:
f2fs_quota_sync
down_read(cp_rwsem) // f2fs_lock_op
filemap_fdatawrite
f2fs_write_data_pages
...
block_opertaion
down_write(cp_rwsem) - marks rwsem as "writer pending"
down_read_trylock(cp_rwsem) - fails as there is
a writer pending.
Code keeps on trying.
We solve this by creating a new rwsem, used specifically to
synchronize this case, instead of attempting to reuse an existing lock
for this
---
fs/f2fs/f2fs.h | 3 +++
fs/f2fs/super.c | 13 +++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cb700d797296..b3e55137be7f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1448,6 +1448,7 @@ struct f2fs_sb_info {
struct inode *meta_inode; /* cache meta blocks */
struct mutex cp_mutex; /* checkpoint procedure lock */
struct rw_semaphore cp_rwsem; /* blocking FS operations */
+ struct rw_semaphore cp_quota_rwsem; /* blocking quota sync operations */
struct rw_semaphore node_write; /* locking node writes */
struct rw_semaphore node_change; /* locking node change */
wait_queue_head_t cp_wait;
@@ -1961,12 +1962,14 @@ static inline void f2fs_unlock_op(struct f2fs_sb_info *sbi)
static inline void f2fs_lock_all(struct f2fs_sb_info *sbi)
{
+ down_write(&sbi->cp_quota_rwsem);
down_write(&sbi->cp_rwsem);
}
static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi)
{
up_write(&sbi->cp_rwsem);
+ up_write(&sbi->cp_quota_rwsem);
}
static inline int __get_cp_reason(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 00eff2f51807..5ce61147d7e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2209,8 +2209,16 @@ int f2fs_quota_sync(struct super_block *sb, int type)
* f2fs_dquot_commit
* block_operation
* down_read(quota_sem)
+ *
+ * However, we cannot use the cp_rwsem to prevent this
+ * deadlock, as the cp_rwsem is taken for read inside the
+ * f2fs_dquot_commit code, and rwsem is not recursive.
+ *
+ * We therefore use a special lock to synchronize
+ * f2fs_quota_sync with block_operations, as this is the only
+ * place where such recurrsion occurs.
*/
- f2fs_lock_op(sbi);
+ down_read(&sbi->cp_quota_rwsem);
down_read(&sbi->quota_sem);
ret = dquot_writeback_dquots(sb, type);
@@ -2251,7 +2259,7 @@ int f2fs_quota_sync(struct super_block *sb, int type)
if (ret)
set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
up_read(&sbi->quota_sem);
- f2fs_unlock_op(sbi);
+ up_read(&sbi->cp_quota_rwsem);
return ret;
}
@@ -3599,6 +3607,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->cp_rwsem);
init_rwsem(&sbi->quota_sem);
+ init_rwsem(&sbi->cp_quota_rwsem);
init_waitqueue_head(&sbi->cp_wait);
init_sb_info(sbi);
--
2.29.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fa4d30556883f2eaab425b88ba9904865a4d00f3 Mon Sep 17 00:00:00 2001
From: Christian Eggers <ceggers(a)arri.de>
Date: Wed, 7 Oct 2020 10:45:22 +0200
Subject: [PATCH] i2c: imx: Fix reset of I2SR_IAL flag
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to the "VFxxx Controller Reference Manual" (and the comment
block starting at line 97), Vybrid requires writing a one for clearing
an interrupt flag. Syncing the method for clearing I2SR_IIF in
i2c_imx_isr().
Signed-off-by: Christian Eggers <ceggers(a)arri.de>
Fixes: 4b775022f6fd ("i2c: imx: add struct to hold more configurable quirks")
Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wolfram Sang <wsa(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 0ab5381aa012..cbdcab73a055 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -412,6 +412,19 @@ static void i2c_imx_dma_free(struct imx_i2c_struct *i2c_imx)
dma->chan_using = NULL;
}
+static void i2c_imx_clear_irq(struct imx_i2c_struct *i2c_imx, unsigned int bits)
+{
+ unsigned int temp;
+
+ /*
+ * i2sr_clr_opcode is the value to clear all interrupts. Here we want to
+ * clear only <bits>, so we write ~i2sr_clr_opcode with just <bits>
+ * toggled. This is required because i.MX needs W1C and Vybrid uses W0C.
+ */
+ temp = ~i2c_imx->hwdata->i2sr_clr_opcode ^ bits;
+ imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2SR);
+}
+
static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy, bool atomic)
{
unsigned long orig_jiffies = jiffies;
@@ -424,8 +437,7 @@ static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy, bool a
/* check for arbitration lost */
if (temp & I2SR_IAL) {
- temp &= ~I2SR_IAL;
- imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2SR);
+ i2c_imx_clear_irq(i2c_imx, I2SR_IAL);
return -EAGAIN;
}
@@ -623,9 +635,7 @@ static irqreturn_t i2c_imx_isr(int irq, void *dev_id)
if (temp & I2SR_IIF) {
/* save status register */
i2c_imx->i2csr = temp;
- temp &= ~I2SR_IIF;
- temp |= (i2c_imx->hwdata->i2sr_clr_opcode & I2SR_IIF);
- imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2SR);
+ i2c_imx_clear_irq(i2c_imx, I2SR_IIF);
wake_up(&i2c_imx->queue);
return IRQ_HANDLED;
}
Hello,
Is it possible to apply the following commit on the branch linux-4.19.y?
de9f8eea5a44 ("drm/atomic_helper: Stop modesets on unregistered connectors harder")
This commit is applied to the other LTS kernels, but is missing on
linux-4.19.y. Without this patch my i.MX6ULL SoM doesn't initialize
the display correctly after booting.
Thanks in advance,
Christoph
____________________________________________________________________________________
DH electronics GmbH | Am Anger 8 | 83346 Bergen | Germany | Fon: +49 8662 4882 0
HRB Traunstein 9602 | Ust Id Nr.: DE174205805
Board of Management | Dipl.-Ing.(FH) Stefan Daxenberger | Dipl.-Ing.(FH) Helmut Henschke
On Sat, Nov 28, 2020 at 09:59:09PM +0800, Wen Yang wrote:
>
>
> 在 2020/11/28 下午4:06, Greg Kroah-Hartman 写道:
> > On Sat, Nov 28, 2020 at 02:47:22PM +0800, Wen Yang wrote:
> > > [ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]
> >
> > No, that is not this commit at all.
> >
> > What are you wanting to have happen here?
> >
> > confused,
> >
> > greg k-h
> >
>
> Thanks.
> Let's explain it briefly:
>
> The dentries such as /proc/<pid>/ns/ipc have the DCACHE_OP_DELETE flag, they
> should be deleted when the process exits.
> Suppose the following race appears:
>
> release_task dput
> -> proc_flush_task
> -> dentry->d_op->d_delete(dentry)
> -> __exit_signal
> -> dentry->d_lockref.count-- and return.
>
>
> In the proc_flush_task function, because another processe is using this
> dentry, it cannot be deleted;
> In the dput function, d_delete may be executed before __exit_signal (the pid
> has not been unhashed), so that d_delete returns false and the dentry can
> not be deleted.
>
> So this dentry is still caches (count is 0), and its parent dentries are
> also caches, and those dentries can only be deleted when drop_caches is
> manually triggered.
>
>
> In the release_task function, we should move proc_flush_task after the
> tasklist_lock is released(Just like the commit
> 7bc3e6e55acf065500a24621f3b313e7e5998acf did).
I do not understand, is this a patch being submitted for the main kernel
tree, or for a stable kernel release?
If stable, please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
If main kernel tree, you can't have the "Upstream commit" line in the
changelog text as that makes no sense at all.
thanks,
greg k-h
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ff1712f953e27f0b0718762ec17d0adb15c9fd0b Mon Sep 17 00:00:00 2001
From: Will Deacon <will(a)kernel.org>
Date: Fri, 20 Nov 2020 13:57:48 +0000
Subject: [PATCH] arm64: pgtable: Ensure dirty bit is preserved across
pte_wrprotect()
With hardware dirty bit management, calling pte_wrprotect() on a writable,
dirty PTE will lose the dirty state and return a read-only, clean entry.
Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that
the dirty bit is preserved for writable entries, as this is required for
soft-dirty bit management if we enable it in the future.
Cc: <stable(a)vger.kernel.org>
Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits")
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20201120143557.6715-3-will@kernel.org
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 326c34677e86..5628289b9d5e 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -165,13 +165,6 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot)
return pmd;
}
-static inline pte_t pte_wrprotect(pte_t pte)
-{
- pte = clear_pte_bit(pte, __pgprot(PTE_WRITE));
- pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
- return pte;
-}
-
static inline pte_t pte_mkwrite(pte_t pte)
{
pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
@@ -197,6 +190,20 @@ static inline pte_t pte_mkdirty(pte_t pte)
return pte;
}
+static inline pte_t pte_wrprotect(pte_t pte)
+{
+ /*
+ * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY
+ * clear), set the PTE_DIRTY bit.
+ */
+ if (pte_hw_dirty(pte))
+ pte = pte_mkdirty(pte);
+
+ pte = clear_pte_bit(pte, __pgprot(PTE_WRITE));
+ pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
+ return pte;
+}
+
static inline pte_t pte_mkold(pte_t pte)
{
return clear_pte_bit(pte, __pgprot(PTE_AF));
@@ -846,12 +853,6 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
pte = READ_ONCE(*ptep);
do {
old_pte = pte;
- /*
- * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY
- * clear), set the PTE_DIRTY bit.
- */
- if (pte_hw_dirty(pte))
- pte = pte_mkdirty(pte);
pte = pte_wrprotect(pte);
pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
pte_val(old_pte), pte_val(pte));
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ff1712f953e27f0b0718762ec17d0adb15c9fd0b Mon Sep 17 00:00:00 2001
From: Will Deacon <will(a)kernel.org>
Date: Fri, 20 Nov 2020 13:57:48 +0000
Subject: [PATCH] arm64: pgtable: Ensure dirty bit is preserved across
pte_wrprotect()
With hardware dirty bit management, calling pte_wrprotect() on a writable,
dirty PTE will lose the dirty state and return a read-only, clean entry.
Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that
the dirty bit is preserved for writable entries, as this is required for
soft-dirty bit management if we enable it in the future.
Cc: <stable(a)vger.kernel.org>
Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits")
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20201120143557.6715-3-will@kernel.org
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 326c34677e86..5628289b9d5e 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -165,13 +165,6 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot)
return pmd;
}
-static inline pte_t pte_wrprotect(pte_t pte)
-{
- pte = clear_pte_bit(pte, __pgprot(PTE_WRITE));
- pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
- return pte;
-}
-
static inline pte_t pte_mkwrite(pte_t pte)
{
pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
@@ -197,6 +190,20 @@ static inline pte_t pte_mkdirty(pte_t pte)
return pte;
}
+static inline pte_t pte_wrprotect(pte_t pte)
+{
+ /*
+ * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY
+ * clear), set the PTE_DIRTY bit.
+ */
+ if (pte_hw_dirty(pte))
+ pte = pte_mkdirty(pte);
+
+ pte = clear_pte_bit(pte, __pgprot(PTE_WRITE));
+ pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
+ return pte;
+}
+
static inline pte_t pte_mkold(pte_t pte)
{
return clear_pte_bit(pte, __pgprot(PTE_AF));
@@ -846,12 +853,6 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
pte = READ_ONCE(*ptep);
do {
old_pte = pte;
- /*
- * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY
- * clear), set the PTE_DIRTY bit.
- */
- if (pte_hw_dirty(pte))
- pte = pte_mkdirty(pte);
pte = pte_wrprotect(pte);
pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
pte_val(old_pte), pte_val(pte));
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c334730988ee07908ba4eb816ce78d3fe06fecaa Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 4 Nov 2020 11:07:31 +0000
Subject: [PATCH] btrfs: fix missing delalloc new bit for new delalloc ranges
When doing a buffered write, through one of the write family syscalls, we
look for ranges which currently don't have allocated extents and set the
'delalloc new' bit on them, so that we can report a correct number of used
blocks to the stat(2) syscall until delalloc is flushed and ordered extents
complete.
However there are a few other places where we can do a buffered write
against a range that is mapped to a hole (no extent allocated) and where
we do not set the 'new delalloc' bit. Those places are:
- Doing a memory mapped write against a hole;
- Cloning an inline extent into a hole starting at file offset 0;
- Calling btrfs_cont_expand() when the i_size of the file is not aligned
to the sector size and is located in a hole. For example when cloning
to a destination offset beyond EOF.
So after such cases, until the corresponding delalloc range is flushed and
the respective ordered extents complete, we can report an incorrect number
of blocks used through the stat(2) syscall.
In some cases we can end up reporting 0 used blocks to stat(2), which is a
particular bad value to report as it may mislead tools to think a file is
completely sparse when its i_size is not zero, making them skip reading
any data, an undesired consequence for tools such as archivers and other
backup tools, as reported a long time ago in the following thread (and
other past threads):
https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00001.html
Example reproducer:
$ cat reproducer.sh
#!/bin/bash
MNT=/mnt/sdi
DEV=/dev/sdi
mkfs.btrfs -f $DEV > /dev/null
# mkfs.xfs -f $DEV > /dev/null
# mkfs.ext4 -F $DEV > /dev/null
# mkfs.f2fs -f $DEV > /dev/null
mount $DEV $MNT
xfs_io -f -c "truncate 64K" \
-c "mmap -w 0 64K" \
-c "mwrite -S 0xab 0 64K" \
-c "munmap" \
$MNT/foo
blocks_used=$(stat -c %b $MNT/foo)
echo "blocks used: $blocks_used"
if [ $blocks_used -eq 0 ]; then
echo "ERROR: blocks used is 0"
fi
umount $DEV
$ ./reproducer.sh
blocks used: 0
ERROR: blocks used is 0
So move the logic that decides to set the 'delalloc bit' bit into the
function btrfs_set_extent_delalloc(), since that is what we use for all
those missing cases as well as for the cases that currently work well.
This change is also preparatory work for an upcoming patch that fixes
other problems related to tracking and reporting the number of bytes used
by an inode.
CC: stable(a)vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 87355a38a654..4373da7bcc0d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -452,46 +452,6 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
}
}
-static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode,
- const u64 start,
- const u64 len,
- struct extent_state **cached_state)
-{
- u64 search_start = start;
- const u64 end = start + len - 1;
-
- while (search_start < end) {
- const u64 search_len = end - search_start + 1;
- struct extent_map *em;
- u64 em_len;
- int ret = 0;
-
- em = btrfs_get_extent(inode, NULL, 0, search_start, search_len);
- if (IS_ERR(em))
- return PTR_ERR(em);
-
- if (em->block_start != EXTENT_MAP_HOLE)
- goto next;
-
- em_len = em->len;
- if (em->start < search_start)
- em_len -= search_start - em->start;
- if (em_len > search_len)
- em_len = search_len;
-
- ret = set_extent_bit(&inode->io_tree, search_start,
- search_start + em_len - 1,
- EXTENT_DELALLOC_NEW,
- NULL, cached_state, GFP_NOFS);
-next:
- search_start = extent_map_end(em);
- free_extent_map(em);
- if (ret)
- return ret;
- }
- return 0;
-}
-
/*
* after copy_from_user, pages need to be dirtied and we need to make
* sure holes are created between the current EOF and the start of
@@ -528,23 +488,6 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
0, 0, cached);
- if (!btrfs_is_free_space_inode(inode)) {
- if (start_pos >= isize &&
- !(inode->flags & BTRFS_INODE_PREALLOC)) {
- /*
- * There can't be any extents following eof in this case
- * so just set the delalloc new bit for the range
- * directly.
- */
- extra_bits |= EXTENT_DELALLOC_NEW;
- } else {
- err = btrfs_find_new_delalloc_bytes(inode, start_pos,
- num_bytes, cached);
- if (err)
- return err;
- }
- }
-
err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
extra_bits, cached);
if (err)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index da58c58ef9aa..7e8d8169779d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2253,11 +2253,69 @@ static int add_pending_csums(struct btrfs_trans_handle *trans,
return 0;
}
+static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode,
+ const u64 start,
+ const u64 len,
+ struct extent_state **cached_state)
+{
+ u64 search_start = start;
+ const u64 end = start + len - 1;
+
+ while (search_start < end) {
+ const u64 search_len = end - search_start + 1;
+ struct extent_map *em;
+ u64 em_len;
+ int ret = 0;
+
+ em = btrfs_get_extent(inode, NULL, 0, search_start, search_len);
+ if (IS_ERR(em))
+ return PTR_ERR(em);
+
+ if (em->block_start != EXTENT_MAP_HOLE)
+ goto next;
+
+ em_len = em->len;
+ if (em->start < search_start)
+ em_len -= search_start - em->start;
+ if (em_len > search_len)
+ em_len = search_len;
+
+ ret = set_extent_bit(&inode->io_tree, search_start,
+ search_start + em_len - 1,
+ EXTENT_DELALLOC_NEW,
+ NULL, cached_state, GFP_NOFS);
+next:
+ search_start = extent_map_end(em);
+ free_extent_map(em);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
int btrfs_set_extent_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
unsigned int extra_bits,
struct extent_state **cached_state)
{
WARN_ON(PAGE_ALIGNED(end));
+
+ if (start >= i_size_read(&inode->vfs_inode) &&
+ !(inode->flags & BTRFS_INODE_PREALLOC)) {
+ /*
+ * There can't be any extents following eof in this case so just
+ * set the delalloc new bit for the range directly.
+ */
+ extra_bits |= EXTENT_DELALLOC_NEW;
+ } else {
+ int ret;
+
+ ret = btrfs_find_new_delalloc_bytes(inode, start,
+ end + 1 - start,
+ cached_state);
+ if (ret)
+ return ret;
+ }
+
return set_extent_delalloc(&inode->io_tree, start, end, extra_bits,
cached_state);
}
diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
index e6719f7db386..04022069761d 100644
--- a/fs/btrfs/tests/inode-tests.c
+++ b/fs/btrfs/tests/inode-tests.c
@@ -983,7 +983,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
ret = clear_extent_bit(&BTRFS_I(inode)->io_tree,
BTRFS_MAX_EXTENT_SIZE >> 1,
(BTRFS_MAX_EXTENT_SIZE >> 1) + sectorsize - 1,
- EXTENT_DELALLOC | EXTENT_UPTODATE, 0, 0, NULL);
+ EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_UPTODATE, 0, 0, NULL);
if (ret) {
test_err("clear_extent_bit returned %d", ret);
goto out;
@@ -1050,7 +1051,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
ret = clear_extent_bit(&BTRFS_I(inode)->io_tree,
BTRFS_MAX_EXTENT_SIZE + sectorsize,
BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1,
- EXTENT_DELALLOC | EXTENT_UPTODATE, 0, 0, NULL);
+ EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_UPTODATE, 0, 0, NULL);
if (ret) {
test_err("clear_extent_bit returned %d", ret);
goto out;
@@ -1082,7 +1084,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
/* Empty */
ret = clear_extent_bit(&BTRFS_I(inode)->io_tree, 0, (u64)-1,
- EXTENT_DELALLOC | EXTENT_UPTODATE, 0, 0, NULL);
+ EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_UPTODATE, 0, 0, NULL);
if (ret) {
test_err("clear_extent_bit returned %d", ret);
goto out;
@@ -1097,7 +1100,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
out:
if (ret)
clear_extent_bit(&BTRFS_I(inode)->io_tree, 0, (u64)-1,
- EXTENT_DELALLOC | EXTENT_UPTODATE, 0, 0, NULL);
+ EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_UPTODATE, 0, 0, NULL);
iput(inode);
btrfs_free_dummy_root(root);
btrfs_free_dummy_fs_info(fs_info);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c334730988ee07908ba4eb816ce78d3fe06fecaa Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 4 Nov 2020 11:07:31 +0000
Subject: [PATCH] btrfs: fix missing delalloc new bit for new delalloc ranges
When doing a buffered write, through one of the write family syscalls, we
look for ranges which currently don't have allocated extents and set the
'delalloc new' bit on them, so that we can report a correct number of used
blocks to the stat(2) syscall until delalloc is flushed and ordered extents
complete.
However there are a few other places where we can do a buffered write
against a range that is mapped to a hole (no extent allocated) and where
we do not set the 'new delalloc' bit. Those places are:
- Doing a memory mapped write against a hole;
- Cloning an inline extent into a hole starting at file offset 0;
- Calling btrfs_cont_expand() when the i_size of the file is not aligned
to the sector size and is located in a hole. For example when cloning
to a destination offset beyond EOF.
So after such cases, until the corresponding delalloc range is flushed and
the respective ordered extents complete, we can report an incorrect number
of blocks used through the stat(2) syscall.
In some cases we can end up reporting 0 used blocks to stat(2), which is a
particular bad value to report as it may mislead tools to think a file is
completely sparse when its i_size is not zero, making them skip reading
any data, an undesired consequence for tools such as archivers and other
backup tools, as reported a long time ago in the following thread (and
other past threads):
https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00001.html
Example reproducer:
$ cat reproducer.sh
#!/bin/bash
MNT=/mnt/sdi
DEV=/dev/sdi
mkfs.btrfs -f $DEV > /dev/null
# mkfs.xfs -f $DEV > /dev/null
# mkfs.ext4 -F $DEV > /dev/null
# mkfs.f2fs -f $DEV > /dev/null
mount $DEV $MNT
xfs_io -f -c "truncate 64K" \
-c "mmap -w 0 64K" \
-c "mwrite -S 0xab 0 64K" \
-c "munmap" \
$MNT/foo
blocks_used=$(stat -c %b $MNT/foo)
echo "blocks used: $blocks_used"
if [ $blocks_used -eq 0 ]; then
echo "ERROR: blocks used is 0"
fi
umount $DEV
$ ./reproducer.sh
blocks used: 0
ERROR: blocks used is 0
So move the logic that decides to set the 'delalloc bit' bit into the
function btrfs_set_extent_delalloc(), since that is what we use for all
those missing cases as well as for the cases that currently work well.
This change is also preparatory work for an upcoming patch that fixes
other problems related to tracking and reporting the number of bytes used
by an inode.
CC: stable(a)vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 87355a38a654..4373da7bcc0d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -452,46 +452,6 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
}
}
-static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode,
- const u64 start,
- const u64 len,
- struct extent_state **cached_state)
-{
- u64 search_start = start;
- const u64 end = start + len - 1;
-
- while (search_start < end) {
- const u64 search_len = end - search_start + 1;
- struct extent_map *em;
- u64 em_len;
- int ret = 0;
-
- em = btrfs_get_extent(inode, NULL, 0, search_start, search_len);
- if (IS_ERR(em))
- return PTR_ERR(em);
-
- if (em->block_start != EXTENT_MAP_HOLE)
- goto next;
-
- em_len = em->len;
- if (em->start < search_start)
- em_len -= search_start - em->start;
- if (em_len > search_len)
- em_len = search_len;
-
- ret = set_extent_bit(&inode->io_tree, search_start,
- search_start + em_len - 1,
- EXTENT_DELALLOC_NEW,
- NULL, cached_state, GFP_NOFS);
-next:
- search_start = extent_map_end(em);
- free_extent_map(em);
- if (ret)
- return ret;
- }
- return 0;
-}
-
/*
* after copy_from_user, pages need to be dirtied and we need to make
* sure holes are created between the current EOF and the start of
@@ -528,23 +488,6 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
0, 0, cached);
- if (!btrfs_is_free_space_inode(inode)) {
- if (start_pos >= isize &&
- !(inode->flags & BTRFS_INODE_PREALLOC)) {
- /*
- * There can't be any extents following eof in this case
- * so just set the delalloc new bit for the range
- * directly.
- */
- extra_bits |= EXTENT_DELALLOC_NEW;
- } else {
- err = btrfs_find_new_delalloc_bytes(inode, start_pos,
- num_bytes, cached);
- if (err)
- return err;
- }
- }
-
err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
extra_bits, cached);
if (err)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index da58c58ef9aa..7e8d8169779d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2253,11 +2253,69 @@ static int add_pending_csums(struct btrfs_trans_handle *trans,
return 0;
}
+static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode,
+ const u64 start,
+ const u64 len,
+ struct extent_state **cached_state)
+{
+ u64 search_start = start;
+ const u64 end = start + len - 1;
+
+ while (search_start < end) {
+ const u64 search_len = end - search_start + 1;
+ struct extent_map *em;
+ u64 em_len;
+ int ret = 0;
+
+ em = btrfs_get_extent(inode, NULL, 0, search_start, search_len);
+ if (IS_ERR(em))
+ return PTR_ERR(em);
+
+ if (em->block_start != EXTENT_MAP_HOLE)
+ goto next;
+
+ em_len = em->len;
+ if (em->start < search_start)
+ em_len -= search_start - em->start;
+ if (em_len > search_len)
+ em_len = search_len;
+
+ ret = set_extent_bit(&inode->io_tree, search_start,
+ search_start + em_len - 1,
+ EXTENT_DELALLOC_NEW,
+ NULL, cached_state, GFP_NOFS);
+next:
+ search_start = extent_map_end(em);
+ free_extent_map(em);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
int btrfs_set_extent_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
unsigned int extra_bits,
struct extent_state **cached_state)
{
WARN_ON(PAGE_ALIGNED(end));
+
+ if (start >= i_size_read(&inode->vfs_inode) &&
+ !(inode->flags & BTRFS_INODE_PREALLOC)) {
+ /*
+ * There can't be any extents following eof in this case so just
+ * set the delalloc new bit for the range directly.
+ */
+ extra_bits |= EXTENT_DELALLOC_NEW;
+ } else {
+ int ret;
+
+ ret = btrfs_find_new_delalloc_bytes(inode, start,
+ end + 1 - start,
+ cached_state);
+ if (ret)
+ return ret;
+ }
+
return set_extent_delalloc(&inode->io_tree, start, end, extra_bits,
cached_state);
}
diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
index e6719f7db386..04022069761d 100644
--- a/fs/btrfs/tests/inode-tests.c
+++ b/fs/btrfs/tests/inode-tests.c
@@ -983,7 +983,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
ret = clear_extent_bit(&BTRFS_I(inode)->io_tree,
BTRFS_MAX_EXTENT_SIZE >> 1,
(BTRFS_MAX_EXTENT_SIZE >> 1) + sectorsize - 1,
- EXTENT_DELALLOC | EXTENT_UPTODATE, 0, 0, NULL);
+ EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_UPTODATE, 0, 0, NULL);
if (ret) {
test_err("clear_extent_bit returned %d", ret);
goto out;
@@ -1050,7 +1051,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
ret = clear_extent_bit(&BTRFS_I(inode)->io_tree,
BTRFS_MAX_EXTENT_SIZE + sectorsize,
BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1,
- EXTENT_DELALLOC | EXTENT_UPTODATE, 0, 0, NULL);
+ EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_UPTODATE, 0, 0, NULL);
if (ret) {
test_err("clear_extent_bit returned %d", ret);
goto out;
@@ -1082,7 +1084,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
/* Empty */
ret = clear_extent_bit(&BTRFS_I(inode)->io_tree, 0, (u64)-1,
- EXTENT_DELALLOC | EXTENT_UPTODATE, 0, 0, NULL);
+ EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_UPTODATE, 0, 0, NULL);
if (ret) {
test_err("clear_extent_bit returned %d", ret);
goto out;
@@ -1097,7 +1100,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
out:
if (ret)
clear_extent_bit(&BTRFS_I(inode)->io_tree, 0, (u64)-1,
- EXTENT_DELALLOC | EXTENT_UPTODATE, 0, 0, NULL);
+ EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
+ EXTENT_UPTODATE, 0, 0, NULL);
iput(inode);
btrfs_free_dummy_root(root);
btrfs_free_dummy_fs_info(fs_info);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d2a9d642512c21a12d19b9250e7a835dcb41a79 Mon Sep 17 00:00:00 2001
From: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Date: Wed, 25 Nov 2020 16:01:12 -0500
Subject: [PATCH] IB/hfi1: Ensure correct mm is used at all times
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.
Cc: <stable(a)vger.kernel.org>
Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.inte…
Suggested-by: Jann Horn <jannh(a)google.com>
Reported-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 8ca51e43cf53..329ee4f48d95 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -206,8 +207,6 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
spin_lock_init(&fd->tid_lock);
spin_lock_init(&fd->invalid_lock);
fd->rec_cpu_num = -1; /* no cpu affinity by default */
- fd->mm = current->mm;
- mmgrab(fd->mm);
fd->dd = dd;
fp->private_data = fd;
return 0;
@@ -711,7 +710,6 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
deallocate_ctxt(uctxt);
done:
- mmdrop(fdata->mm);
if (atomic_dec_and_test(&dd->user_refcount))
complete(&dd->user_comp);
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index b4c6bff60a4e..e09e8244a94c 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_KERNEL_H
#define _HFI1_KERNEL_H
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -1451,7 +1452,6 @@ struct hfi1_filedata {
u32 invalid_tid_idx;
/* protect invalid_tids array and invalid_tid_idx */
spinlock_t invalid_lock;
- struct mm_struct *mm;
};
extern struct xarray hfi1_dev_table;
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
index 24ca17b77b72..f3fb28e3d5d7 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -48,23 +49,11 @@
#include <linux/rculist.h>
#include <linux/mmu_notifier.h>
#include <linux/interval_tree_generic.h>
+#include <linux/sched/mm.h>
#include "mmu_rb.h"
#include "trace.h"
-struct mmu_rb_handler {
- struct mmu_notifier mn;
- struct rb_root_cached root;
- void *ops_arg;
- spinlock_t lock; /* protect the RB tree */
- struct mmu_rb_ops *ops;
- struct mm_struct *mm;
- struct list_head lru_list;
- struct work_struct del_work;
- struct list_head del_list;
- struct workqueue_struct *wq;
-};
-
static unsigned long mmu_node_start(struct mmu_rb_node *);
static unsigned long mmu_node_last(struct mmu_rb_node *);
static int mmu_notifier_range_start(struct mmu_notifier *,
@@ -92,37 +81,36 @@ static unsigned long mmu_node_last(struct mmu_rb_node *node)
return PAGE_ALIGN(node->addr + node->len) - 1;
}
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler)
{
- struct mmu_rb_handler *handlr;
+ struct mmu_rb_handler *h;
int ret;
- handlr = kmalloc(sizeof(*handlr), GFP_KERNEL);
- if (!handlr)
+ h = kmalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
return -ENOMEM;
- handlr->root = RB_ROOT_CACHED;
- handlr->ops = ops;
- handlr->ops_arg = ops_arg;
- INIT_HLIST_NODE(&handlr->mn.hlist);
- spin_lock_init(&handlr->lock);
- handlr->mn.ops = &mn_opts;
- handlr->mm = mm;
- INIT_WORK(&handlr->del_work, handle_remove);
- INIT_LIST_HEAD(&handlr->del_list);
- INIT_LIST_HEAD(&handlr->lru_list);
- handlr->wq = wq;
-
- ret = mmu_notifier_register(&handlr->mn, handlr->mm);
+ h->root = RB_ROOT_CACHED;
+ h->ops = ops;
+ h->ops_arg = ops_arg;
+ INIT_HLIST_NODE(&h->mn.hlist);
+ spin_lock_init(&h->lock);
+ h->mn.ops = &mn_opts;
+ INIT_WORK(&h->del_work, handle_remove);
+ INIT_LIST_HEAD(&h->del_list);
+ INIT_LIST_HEAD(&h->lru_list);
+ h->wq = wq;
+
+ ret = mmu_notifier_register(&h->mn, current->mm);
if (ret) {
- kfree(handlr);
+ kfree(h);
return ret;
}
- *handler = handlr;
+ *handler = h;
return 0;
}
@@ -134,7 +122,7 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
struct list_head del_list;
/* Unregister first so we don't get any more notifications. */
- mmu_notifier_unregister(&handler->mn, handler->mm);
+ mmu_notifier_unregister(&handler->mn, handler->mn.mm);
/*
* Make sure the wq delete handler is finished running. It will not
@@ -166,6 +154,10 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
int ret = 0;
trace_hfi1_mmu_rb_insert(mnode->addr, mnode->len);
+
+ if (current->mm != handler->mn.mm)
+ return -EPERM;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, mnode->addr, mnode->len);
if (node) {
@@ -180,6 +172,7 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
__mmu_int_rb_remove(mnode, &handler->root);
list_del(&mnode->list); /* remove from LRU list */
}
+ mnode->handler = handler;
unlock:
spin_unlock_irqrestore(&handler->lock, flags);
return ret;
@@ -217,6 +210,9 @@ bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler,
unsigned long flags;
bool ret = false;
+ if (current->mm != handler->mn.mm)
+ return ret;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, addr, len);
if (node) {
@@ -239,6 +235,9 @@ void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg)
unsigned long flags;
bool stop = false;
+ if (current->mm != handler->mn.mm)
+ return;
+
INIT_LIST_HEAD(&del_list);
spin_lock_irqsave(&handler->lock, flags);
@@ -272,6 +271,9 @@ void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler,
{
unsigned long flags;
+ if (current->mm != handler->mn.mm)
+ return;
+
/* Validity of handler and node pointers has been checked by caller. */
trace_hfi1_mmu_rb_remove(node->addr, node->len);
spin_lock_irqsave(&handler->lock, flags);
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.h b/drivers/infiniband/hw/hfi1/mmu_rb.h
index f04cec1e99d1..423aacc67e94 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.h
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.h
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -54,6 +55,7 @@ struct mmu_rb_node {
unsigned long len;
unsigned long __last;
struct rb_node node;
+ struct mmu_rb_handler *handler;
struct list_head list;
};
@@ -71,7 +73,19 @@ struct mmu_rb_ops {
void *evict_arg, bool *stop);
};
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+struct mmu_rb_handler {
+ struct mmu_notifier mn;
+ struct rb_root_cached root;
+ void *ops_arg;
+ spinlock_t lock; /* protect the RB tree */
+ struct mmu_rb_ops *ops;
+ struct list_head lru_list;
+ struct work_struct del_work;
+ struct list_head del_list;
+ struct workqueue_struct *wq;
+};
+
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler);
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index f81ca20f4b69..b94fc7fd75a9 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -173,15 +174,18 @@ static void unpin_rcv_pages(struct hfi1_filedata *fd,
{
struct page **pages;
struct hfi1_devdata *dd = fd->uctxt->dd;
+ struct mm_struct *mm;
if (mapped) {
pci_unmap_single(dd->pcidev, node->dma_addr,
node->npages * PAGE_SIZE, PCI_DMA_FROMDEVICE);
pages = &node->pages[idx];
+ mm = mm_from_tid_node(node);
} else {
pages = &tidbuf->pages[idx];
+ mm = current->mm;
}
- hfi1_release_user_pages(fd->mm, pages, npages, mapped);
+ hfi1_release_user_pages(mm, pages, npages, mapped);
fd->tid_n_pinned -= npages;
}
@@ -216,12 +220,12 @@ static int pin_rcv_pages(struct hfi1_filedata *fd, struct tid_user_buf *tidbuf)
* pages, accept the amount pinned so far and program only that.
* User space knows how to deal with partially programmed buffers.
*/
- if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) {
+ if (!hfi1_can_pin_pages(dd, current->mm, fd->tid_n_pinned, npages)) {
kfree(pages);
return -ENOMEM;
}
- pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages);
+ pinned = hfi1_acquire_user_pages(current->mm, vaddr, npages, true, pages);
if (pinned <= 0) {
kfree(pages);
return pinned;
@@ -756,7 +760,7 @@ static int set_rcvarray_entry(struct hfi1_filedata *fd,
if (fd->use_mn) {
ret = mmu_interval_notifier_insert(
- &node->notifier, fd->mm,
+ &node->notifier, current->mm,
tbuf->vaddr + (pageidx * PAGE_SIZE), npages * PAGE_SIZE,
&tid_mn_ops);
if (ret)
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.h b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
index 332abb446861..d45c7b6988d4 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.h
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_EXP_RCV_H
#define _HFI1_USER_EXP_RCV_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -95,4 +96,9 @@ int hfi1_user_exp_rcv_clear(struct hfi1_filedata *fd,
int hfi1_user_exp_rcv_invalid(struct hfi1_filedata *fd,
struct hfi1_tid_info *tinfo);
+static inline struct mm_struct *mm_from_tid_node(struct tid_rb_node *node)
+{
+ return node->notifier.mm;
+}
+
#endif /* _HFI1_USER_EXP_RCV_H */
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index a92346e88628..4a4956f96a7e 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -188,7 +189,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
- pq->mm = fd->mm;
iowait_init(&pq->busy, 0, NULL, NULL, defer_packet_queue,
activate_packet_queue, NULL, NULL);
@@ -230,7 +230,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
cq->nentries = hfi1_sdma_comp_ring_size;
- ret = hfi1_mmu_rb_register(pq, pq->mm, &sdma_rb_ops, dd->pport->hfi1_wq,
+ ret = hfi1_mmu_rb_register(pq, &sdma_rb_ops, dd->pport->hfi1_wq,
&pq->handler);
if (ret) {
dd_dev_err(dd, "Failed to register with MMU %d", ret);
@@ -980,13 +980,13 @@ static int pin_sdma_pages(struct user_sdma_request *req,
npages -= node->npages;
retry:
- if (!hfi1_can_pin_pages(pq->dd, pq->mm,
+ if (!hfi1_can_pin_pages(pq->dd, current->mm,
atomic_read(&pq->n_locked), npages)) {
cleared = sdma_cache_evict(pq, npages);
if (cleared >= npages)
goto retry;
}
- pinned = hfi1_acquire_user_pages(pq->mm,
+ pinned = hfi1_acquire_user_pages(current->mm,
((unsigned long)iovec->iov.iov_base +
(node->npages * PAGE_SIZE)), npages, 0,
pages + node->npages);
@@ -995,7 +995,7 @@ static int pin_sdma_pages(struct user_sdma_request *req,
return pinned;
}
if (pinned != npages) {
- unpin_vector_pages(pq->mm, pages, node->npages, pinned);
+ unpin_vector_pages(current->mm, pages, node->npages, pinned);
return -EFAULT;
}
kfree(node->pages);
@@ -1008,7 +1008,8 @@ static int pin_sdma_pages(struct user_sdma_request *req,
static void unpin_sdma_pages(struct sdma_mmu_node *node)
{
if (node->npages) {
- unpin_vector_pages(node->pq->mm, node->pages, 0, node->npages);
+ unpin_vector_pages(mm_from_sdma_node(node), node->pages, 0,
+ node->npages);
atomic_sub(node->npages, &node->pq->n_locked);
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 9972e0e6545e..1e8c02fe8ad1 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_SDMA_H
#define _HFI1_USER_SDMA_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -133,7 +134,6 @@ struct hfi1_user_sdma_pkt_q {
unsigned long unpinned;
struct mmu_rb_handler *handler;
atomic_t n_locked;
- struct mm_struct *mm;
};
struct hfi1_user_sdma_comp_q {
@@ -250,4 +250,9 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
struct iovec *iovec, unsigned long dim,
unsigned long *count);
+static inline struct mm_struct *mm_from_sdma_node(struct sdma_mmu_node *node)
+{
+ return node->rb.handler->mn.mm;
+}
+
#endif /* _HFI1_USER_SDMA_H */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d2a9d642512c21a12d19b9250e7a835dcb41a79 Mon Sep 17 00:00:00 2001
From: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Date: Wed, 25 Nov 2020 16:01:12 -0500
Subject: [PATCH] IB/hfi1: Ensure correct mm is used at all times
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.
Cc: <stable(a)vger.kernel.org>
Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.inte…
Suggested-by: Jann Horn <jannh(a)google.com>
Reported-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 8ca51e43cf53..329ee4f48d95 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -206,8 +207,6 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
spin_lock_init(&fd->tid_lock);
spin_lock_init(&fd->invalid_lock);
fd->rec_cpu_num = -1; /* no cpu affinity by default */
- fd->mm = current->mm;
- mmgrab(fd->mm);
fd->dd = dd;
fp->private_data = fd;
return 0;
@@ -711,7 +710,6 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
deallocate_ctxt(uctxt);
done:
- mmdrop(fdata->mm);
if (atomic_dec_and_test(&dd->user_refcount))
complete(&dd->user_comp);
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index b4c6bff60a4e..e09e8244a94c 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_KERNEL_H
#define _HFI1_KERNEL_H
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -1451,7 +1452,6 @@ struct hfi1_filedata {
u32 invalid_tid_idx;
/* protect invalid_tids array and invalid_tid_idx */
spinlock_t invalid_lock;
- struct mm_struct *mm;
};
extern struct xarray hfi1_dev_table;
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
index 24ca17b77b72..f3fb28e3d5d7 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -48,23 +49,11 @@
#include <linux/rculist.h>
#include <linux/mmu_notifier.h>
#include <linux/interval_tree_generic.h>
+#include <linux/sched/mm.h>
#include "mmu_rb.h"
#include "trace.h"
-struct mmu_rb_handler {
- struct mmu_notifier mn;
- struct rb_root_cached root;
- void *ops_arg;
- spinlock_t lock; /* protect the RB tree */
- struct mmu_rb_ops *ops;
- struct mm_struct *mm;
- struct list_head lru_list;
- struct work_struct del_work;
- struct list_head del_list;
- struct workqueue_struct *wq;
-};
-
static unsigned long mmu_node_start(struct mmu_rb_node *);
static unsigned long mmu_node_last(struct mmu_rb_node *);
static int mmu_notifier_range_start(struct mmu_notifier *,
@@ -92,37 +81,36 @@ static unsigned long mmu_node_last(struct mmu_rb_node *node)
return PAGE_ALIGN(node->addr + node->len) - 1;
}
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler)
{
- struct mmu_rb_handler *handlr;
+ struct mmu_rb_handler *h;
int ret;
- handlr = kmalloc(sizeof(*handlr), GFP_KERNEL);
- if (!handlr)
+ h = kmalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
return -ENOMEM;
- handlr->root = RB_ROOT_CACHED;
- handlr->ops = ops;
- handlr->ops_arg = ops_arg;
- INIT_HLIST_NODE(&handlr->mn.hlist);
- spin_lock_init(&handlr->lock);
- handlr->mn.ops = &mn_opts;
- handlr->mm = mm;
- INIT_WORK(&handlr->del_work, handle_remove);
- INIT_LIST_HEAD(&handlr->del_list);
- INIT_LIST_HEAD(&handlr->lru_list);
- handlr->wq = wq;
-
- ret = mmu_notifier_register(&handlr->mn, handlr->mm);
+ h->root = RB_ROOT_CACHED;
+ h->ops = ops;
+ h->ops_arg = ops_arg;
+ INIT_HLIST_NODE(&h->mn.hlist);
+ spin_lock_init(&h->lock);
+ h->mn.ops = &mn_opts;
+ INIT_WORK(&h->del_work, handle_remove);
+ INIT_LIST_HEAD(&h->del_list);
+ INIT_LIST_HEAD(&h->lru_list);
+ h->wq = wq;
+
+ ret = mmu_notifier_register(&h->mn, current->mm);
if (ret) {
- kfree(handlr);
+ kfree(h);
return ret;
}
- *handler = handlr;
+ *handler = h;
return 0;
}
@@ -134,7 +122,7 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
struct list_head del_list;
/* Unregister first so we don't get any more notifications. */
- mmu_notifier_unregister(&handler->mn, handler->mm);
+ mmu_notifier_unregister(&handler->mn, handler->mn.mm);
/*
* Make sure the wq delete handler is finished running. It will not
@@ -166,6 +154,10 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
int ret = 0;
trace_hfi1_mmu_rb_insert(mnode->addr, mnode->len);
+
+ if (current->mm != handler->mn.mm)
+ return -EPERM;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, mnode->addr, mnode->len);
if (node) {
@@ -180,6 +172,7 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
__mmu_int_rb_remove(mnode, &handler->root);
list_del(&mnode->list); /* remove from LRU list */
}
+ mnode->handler = handler;
unlock:
spin_unlock_irqrestore(&handler->lock, flags);
return ret;
@@ -217,6 +210,9 @@ bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler,
unsigned long flags;
bool ret = false;
+ if (current->mm != handler->mn.mm)
+ return ret;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, addr, len);
if (node) {
@@ -239,6 +235,9 @@ void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg)
unsigned long flags;
bool stop = false;
+ if (current->mm != handler->mn.mm)
+ return;
+
INIT_LIST_HEAD(&del_list);
spin_lock_irqsave(&handler->lock, flags);
@@ -272,6 +271,9 @@ void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler,
{
unsigned long flags;
+ if (current->mm != handler->mn.mm)
+ return;
+
/* Validity of handler and node pointers has been checked by caller. */
trace_hfi1_mmu_rb_remove(node->addr, node->len);
spin_lock_irqsave(&handler->lock, flags);
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.h b/drivers/infiniband/hw/hfi1/mmu_rb.h
index f04cec1e99d1..423aacc67e94 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.h
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.h
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -54,6 +55,7 @@ struct mmu_rb_node {
unsigned long len;
unsigned long __last;
struct rb_node node;
+ struct mmu_rb_handler *handler;
struct list_head list;
};
@@ -71,7 +73,19 @@ struct mmu_rb_ops {
void *evict_arg, bool *stop);
};
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+struct mmu_rb_handler {
+ struct mmu_notifier mn;
+ struct rb_root_cached root;
+ void *ops_arg;
+ spinlock_t lock; /* protect the RB tree */
+ struct mmu_rb_ops *ops;
+ struct list_head lru_list;
+ struct work_struct del_work;
+ struct list_head del_list;
+ struct workqueue_struct *wq;
+};
+
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler);
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index f81ca20f4b69..b94fc7fd75a9 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -173,15 +174,18 @@ static void unpin_rcv_pages(struct hfi1_filedata *fd,
{
struct page **pages;
struct hfi1_devdata *dd = fd->uctxt->dd;
+ struct mm_struct *mm;
if (mapped) {
pci_unmap_single(dd->pcidev, node->dma_addr,
node->npages * PAGE_SIZE, PCI_DMA_FROMDEVICE);
pages = &node->pages[idx];
+ mm = mm_from_tid_node(node);
} else {
pages = &tidbuf->pages[idx];
+ mm = current->mm;
}
- hfi1_release_user_pages(fd->mm, pages, npages, mapped);
+ hfi1_release_user_pages(mm, pages, npages, mapped);
fd->tid_n_pinned -= npages;
}
@@ -216,12 +220,12 @@ static int pin_rcv_pages(struct hfi1_filedata *fd, struct tid_user_buf *tidbuf)
* pages, accept the amount pinned so far and program only that.
* User space knows how to deal with partially programmed buffers.
*/
- if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) {
+ if (!hfi1_can_pin_pages(dd, current->mm, fd->tid_n_pinned, npages)) {
kfree(pages);
return -ENOMEM;
}
- pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages);
+ pinned = hfi1_acquire_user_pages(current->mm, vaddr, npages, true, pages);
if (pinned <= 0) {
kfree(pages);
return pinned;
@@ -756,7 +760,7 @@ static int set_rcvarray_entry(struct hfi1_filedata *fd,
if (fd->use_mn) {
ret = mmu_interval_notifier_insert(
- &node->notifier, fd->mm,
+ &node->notifier, current->mm,
tbuf->vaddr + (pageidx * PAGE_SIZE), npages * PAGE_SIZE,
&tid_mn_ops);
if (ret)
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.h b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
index 332abb446861..d45c7b6988d4 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.h
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_EXP_RCV_H
#define _HFI1_USER_EXP_RCV_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -95,4 +96,9 @@ int hfi1_user_exp_rcv_clear(struct hfi1_filedata *fd,
int hfi1_user_exp_rcv_invalid(struct hfi1_filedata *fd,
struct hfi1_tid_info *tinfo);
+static inline struct mm_struct *mm_from_tid_node(struct tid_rb_node *node)
+{
+ return node->notifier.mm;
+}
+
#endif /* _HFI1_USER_EXP_RCV_H */
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index a92346e88628..4a4956f96a7e 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -188,7 +189,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
- pq->mm = fd->mm;
iowait_init(&pq->busy, 0, NULL, NULL, defer_packet_queue,
activate_packet_queue, NULL, NULL);
@@ -230,7 +230,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
cq->nentries = hfi1_sdma_comp_ring_size;
- ret = hfi1_mmu_rb_register(pq, pq->mm, &sdma_rb_ops, dd->pport->hfi1_wq,
+ ret = hfi1_mmu_rb_register(pq, &sdma_rb_ops, dd->pport->hfi1_wq,
&pq->handler);
if (ret) {
dd_dev_err(dd, "Failed to register with MMU %d", ret);
@@ -980,13 +980,13 @@ static int pin_sdma_pages(struct user_sdma_request *req,
npages -= node->npages;
retry:
- if (!hfi1_can_pin_pages(pq->dd, pq->mm,
+ if (!hfi1_can_pin_pages(pq->dd, current->mm,
atomic_read(&pq->n_locked), npages)) {
cleared = sdma_cache_evict(pq, npages);
if (cleared >= npages)
goto retry;
}
- pinned = hfi1_acquire_user_pages(pq->mm,
+ pinned = hfi1_acquire_user_pages(current->mm,
((unsigned long)iovec->iov.iov_base +
(node->npages * PAGE_SIZE)), npages, 0,
pages + node->npages);
@@ -995,7 +995,7 @@ static int pin_sdma_pages(struct user_sdma_request *req,
return pinned;
}
if (pinned != npages) {
- unpin_vector_pages(pq->mm, pages, node->npages, pinned);
+ unpin_vector_pages(current->mm, pages, node->npages, pinned);
return -EFAULT;
}
kfree(node->pages);
@@ -1008,7 +1008,8 @@ static int pin_sdma_pages(struct user_sdma_request *req,
static void unpin_sdma_pages(struct sdma_mmu_node *node)
{
if (node->npages) {
- unpin_vector_pages(node->pq->mm, node->pages, 0, node->npages);
+ unpin_vector_pages(mm_from_sdma_node(node), node->pages, 0,
+ node->npages);
atomic_sub(node->npages, &node->pq->n_locked);
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 9972e0e6545e..1e8c02fe8ad1 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_SDMA_H
#define _HFI1_USER_SDMA_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -133,7 +134,6 @@ struct hfi1_user_sdma_pkt_q {
unsigned long unpinned;
struct mmu_rb_handler *handler;
atomic_t n_locked;
- struct mm_struct *mm;
};
struct hfi1_user_sdma_comp_q {
@@ -250,4 +250,9 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
struct iovec *iovec, unsigned long dim,
unsigned long *count);
+static inline struct mm_struct *mm_from_sdma_node(struct sdma_mmu_node *node)
+{
+ return node->rb.handler->mn.mm;
+}
+
#endif /* _HFI1_USER_SDMA_H */
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d2a9d642512c21a12d19b9250e7a835dcb41a79 Mon Sep 17 00:00:00 2001
From: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Date: Wed, 25 Nov 2020 16:01:12 -0500
Subject: [PATCH] IB/hfi1: Ensure correct mm is used at all times
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.
Cc: <stable(a)vger.kernel.org>
Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.inte…
Suggested-by: Jann Horn <jannh(a)google.com>
Reported-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 8ca51e43cf53..329ee4f48d95 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -206,8 +207,6 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
spin_lock_init(&fd->tid_lock);
spin_lock_init(&fd->invalid_lock);
fd->rec_cpu_num = -1; /* no cpu affinity by default */
- fd->mm = current->mm;
- mmgrab(fd->mm);
fd->dd = dd;
fp->private_data = fd;
return 0;
@@ -711,7 +710,6 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
deallocate_ctxt(uctxt);
done:
- mmdrop(fdata->mm);
if (atomic_dec_and_test(&dd->user_refcount))
complete(&dd->user_comp);
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index b4c6bff60a4e..e09e8244a94c 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_KERNEL_H
#define _HFI1_KERNEL_H
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -1451,7 +1452,6 @@ struct hfi1_filedata {
u32 invalid_tid_idx;
/* protect invalid_tids array and invalid_tid_idx */
spinlock_t invalid_lock;
- struct mm_struct *mm;
};
extern struct xarray hfi1_dev_table;
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
index 24ca17b77b72..f3fb28e3d5d7 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -48,23 +49,11 @@
#include <linux/rculist.h>
#include <linux/mmu_notifier.h>
#include <linux/interval_tree_generic.h>
+#include <linux/sched/mm.h>
#include "mmu_rb.h"
#include "trace.h"
-struct mmu_rb_handler {
- struct mmu_notifier mn;
- struct rb_root_cached root;
- void *ops_arg;
- spinlock_t lock; /* protect the RB tree */
- struct mmu_rb_ops *ops;
- struct mm_struct *mm;
- struct list_head lru_list;
- struct work_struct del_work;
- struct list_head del_list;
- struct workqueue_struct *wq;
-};
-
static unsigned long mmu_node_start(struct mmu_rb_node *);
static unsigned long mmu_node_last(struct mmu_rb_node *);
static int mmu_notifier_range_start(struct mmu_notifier *,
@@ -92,37 +81,36 @@ static unsigned long mmu_node_last(struct mmu_rb_node *node)
return PAGE_ALIGN(node->addr + node->len) - 1;
}
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler)
{
- struct mmu_rb_handler *handlr;
+ struct mmu_rb_handler *h;
int ret;
- handlr = kmalloc(sizeof(*handlr), GFP_KERNEL);
- if (!handlr)
+ h = kmalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
return -ENOMEM;
- handlr->root = RB_ROOT_CACHED;
- handlr->ops = ops;
- handlr->ops_arg = ops_arg;
- INIT_HLIST_NODE(&handlr->mn.hlist);
- spin_lock_init(&handlr->lock);
- handlr->mn.ops = &mn_opts;
- handlr->mm = mm;
- INIT_WORK(&handlr->del_work, handle_remove);
- INIT_LIST_HEAD(&handlr->del_list);
- INIT_LIST_HEAD(&handlr->lru_list);
- handlr->wq = wq;
-
- ret = mmu_notifier_register(&handlr->mn, handlr->mm);
+ h->root = RB_ROOT_CACHED;
+ h->ops = ops;
+ h->ops_arg = ops_arg;
+ INIT_HLIST_NODE(&h->mn.hlist);
+ spin_lock_init(&h->lock);
+ h->mn.ops = &mn_opts;
+ INIT_WORK(&h->del_work, handle_remove);
+ INIT_LIST_HEAD(&h->del_list);
+ INIT_LIST_HEAD(&h->lru_list);
+ h->wq = wq;
+
+ ret = mmu_notifier_register(&h->mn, current->mm);
if (ret) {
- kfree(handlr);
+ kfree(h);
return ret;
}
- *handler = handlr;
+ *handler = h;
return 0;
}
@@ -134,7 +122,7 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
struct list_head del_list;
/* Unregister first so we don't get any more notifications. */
- mmu_notifier_unregister(&handler->mn, handler->mm);
+ mmu_notifier_unregister(&handler->mn, handler->mn.mm);
/*
* Make sure the wq delete handler is finished running. It will not
@@ -166,6 +154,10 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
int ret = 0;
trace_hfi1_mmu_rb_insert(mnode->addr, mnode->len);
+
+ if (current->mm != handler->mn.mm)
+ return -EPERM;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, mnode->addr, mnode->len);
if (node) {
@@ -180,6 +172,7 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
__mmu_int_rb_remove(mnode, &handler->root);
list_del(&mnode->list); /* remove from LRU list */
}
+ mnode->handler = handler;
unlock:
spin_unlock_irqrestore(&handler->lock, flags);
return ret;
@@ -217,6 +210,9 @@ bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler,
unsigned long flags;
bool ret = false;
+ if (current->mm != handler->mn.mm)
+ return ret;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, addr, len);
if (node) {
@@ -239,6 +235,9 @@ void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg)
unsigned long flags;
bool stop = false;
+ if (current->mm != handler->mn.mm)
+ return;
+
INIT_LIST_HEAD(&del_list);
spin_lock_irqsave(&handler->lock, flags);
@@ -272,6 +271,9 @@ void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler,
{
unsigned long flags;
+ if (current->mm != handler->mn.mm)
+ return;
+
/* Validity of handler and node pointers has been checked by caller. */
trace_hfi1_mmu_rb_remove(node->addr, node->len);
spin_lock_irqsave(&handler->lock, flags);
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.h b/drivers/infiniband/hw/hfi1/mmu_rb.h
index f04cec1e99d1..423aacc67e94 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.h
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.h
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -54,6 +55,7 @@ struct mmu_rb_node {
unsigned long len;
unsigned long __last;
struct rb_node node;
+ struct mmu_rb_handler *handler;
struct list_head list;
};
@@ -71,7 +73,19 @@ struct mmu_rb_ops {
void *evict_arg, bool *stop);
};
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+struct mmu_rb_handler {
+ struct mmu_notifier mn;
+ struct rb_root_cached root;
+ void *ops_arg;
+ spinlock_t lock; /* protect the RB tree */
+ struct mmu_rb_ops *ops;
+ struct list_head lru_list;
+ struct work_struct del_work;
+ struct list_head del_list;
+ struct workqueue_struct *wq;
+};
+
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler);
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index f81ca20f4b69..b94fc7fd75a9 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -173,15 +174,18 @@ static void unpin_rcv_pages(struct hfi1_filedata *fd,
{
struct page **pages;
struct hfi1_devdata *dd = fd->uctxt->dd;
+ struct mm_struct *mm;
if (mapped) {
pci_unmap_single(dd->pcidev, node->dma_addr,
node->npages * PAGE_SIZE, PCI_DMA_FROMDEVICE);
pages = &node->pages[idx];
+ mm = mm_from_tid_node(node);
} else {
pages = &tidbuf->pages[idx];
+ mm = current->mm;
}
- hfi1_release_user_pages(fd->mm, pages, npages, mapped);
+ hfi1_release_user_pages(mm, pages, npages, mapped);
fd->tid_n_pinned -= npages;
}
@@ -216,12 +220,12 @@ static int pin_rcv_pages(struct hfi1_filedata *fd, struct tid_user_buf *tidbuf)
* pages, accept the amount pinned so far and program only that.
* User space knows how to deal with partially programmed buffers.
*/
- if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) {
+ if (!hfi1_can_pin_pages(dd, current->mm, fd->tid_n_pinned, npages)) {
kfree(pages);
return -ENOMEM;
}
- pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages);
+ pinned = hfi1_acquire_user_pages(current->mm, vaddr, npages, true, pages);
if (pinned <= 0) {
kfree(pages);
return pinned;
@@ -756,7 +760,7 @@ static int set_rcvarray_entry(struct hfi1_filedata *fd,
if (fd->use_mn) {
ret = mmu_interval_notifier_insert(
- &node->notifier, fd->mm,
+ &node->notifier, current->mm,
tbuf->vaddr + (pageidx * PAGE_SIZE), npages * PAGE_SIZE,
&tid_mn_ops);
if (ret)
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.h b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
index 332abb446861..d45c7b6988d4 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.h
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_EXP_RCV_H
#define _HFI1_USER_EXP_RCV_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -95,4 +96,9 @@ int hfi1_user_exp_rcv_clear(struct hfi1_filedata *fd,
int hfi1_user_exp_rcv_invalid(struct hfi1_filedata *fd,
struct hfi1_tid_info *tinfo);
+static inline struct mm_struct *mm_from_tid_node(struct tid_rb_node *node)
+{
+ return node->notifier.mm;
+}
+
#endif /* _HFI1_USER_EXP_RCV_H */
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index a92346e88628..4a4956f96a7e 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -188,7 +189,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
- pq->mm = fd->mm;
iowait_init(&pq->busy, 0, NULL, NULL, defer_packet_queue,
activate_packet_queue, NULL, NULL);
@@ -230,7 +230,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
cq->nentries = hfi1_sdma_comp_ring_size;
- ret = hfi1_mmu_rb_register(pq, pq->mm, &sdma_rb_ops, dd->pport->hfi1_wq,
+ ret = hfi1_mmu_rb_register(pq, &sdma_rb_ops, dd->pport->hfi1_wq,
&pq->handler);
if (ret) {
dd_dev_err(dd, "Failed to register with MMU %d", ret);
@@ -980,13 +980,13 @@ static int pin_sdma_pages(struct user_sdma_request *req,
npages -= node->npages;
retry:
- if (!hfi1_can_pin_pages(pq->dd, pq->mm,
+ if (!hfi1_can_pin_pages(pq->dd, current->mm,
atomic_read(&pq->n_locked), npages)) {
cleared = sdma_cache_evict(pq, npages);
if (cleared >= npages)
goto retry;
}
- pinned = hfi1_acquire_user_pages(pq->mm,
+ pinned = hfi1_acquire_user_pages(current->mm,
((unsigned long)iovec->iov.iov_base +
(node->npages * PAGE_SIZE)), npages, 0,
pages + node->npages);
@@ -995,7 +995,7 @@ static int pin_sdma_pages(struct user_sdma_request *req,
return pinned;
}
if (pinned != npages) {
- unpin_vector_pages(pq->mm, pages, node->npages, pinned);
+ unpin_vector_pages(current->mm, pages, node->npages, pinned);
return -EFAULT;
}
kfree(node->pages);
@@ -1008,7 +1008,8 @@ static int pin_sdma_pages(struct user_sdma_request *req,
static void unpin_sdma_pages(struct sdma_mmu_node *node)
{
if (node->npages) {
- unpin_vector_pages(node->pq->mm, node->pages, 0, node->npages);
+ unpin_vector_pages(mm_from_sdma_node(node), node->pages, 0,
+ node->npages);
atomic_sub(node->npages, &node->pq->n_locked);
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 9972e0e6545e..1e8c02fe8ad1 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_SDMA_H
#define _HFI1_USER_SDMA_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -133,7 +134,6 @@ struct hfi1_user_sdma_pkt_q {
unsigned long unpinned;
struct mmu_rb_handler *handler;
atomic_t n_locked;
- struct mm_struct *mm;
};
struct hfi1_user_sdma_comp_q {
@@ -250,4 +250,9 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
struct iovec *iovec, unsigned long dim,
unsigned long *count);
+static inline struct mm_struct *mm_from_sdma_node(struct sdma_mmu_node *node)
+{
+ return node->rb.handler->mn.mm;
+}
+
#endif /* _HFI1_USER_SDMA_H */
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d2a9d642512c21a12d19b9250e7a835dcb41a79 Mon Sep 17 00:00:00 2001
From: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Date: Wed, 25 Nov 2020 16:01:12 -0500
Subject: [PATCH] IB/hfi1: Ensure correct mm is used at all times
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.
Cc: <stable(a)vger.kernel.org>
Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.inte…
Suggested-by: Jann Horn <jannh(a)google.com>
Reported-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 8ca51e43cf53..329ee4f48d95 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -206,8 +207,6 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
spin_lock_init(&fd->tid_lock);
spin_lock_init(&fd->invalid_lock);
fd->rec_cpu_num = -1; /* no cpu affinity by default */
- fd->mm = current->mm;
- mmgrab(fd->mm);
fd->dd = dd;
fp->private_data = fd;
return 0;
@@ -711,7 +710,6 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
deallocate_ctxt(uctxt);
done:
- mmdrop(fdata->mm);
if (atomic_dec_and_test(&dd->user_refcount))
complete(&dd->user_comp);
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index b4c6bff60a4e..e09e8244a94c 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_KERNEL_H
#define _HFI1_KERNEL_H
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -1451,7 +1452,6 @@ struct hfi1_filedata {
u32 invalid_tid_idx;
/* protect invalid_tids array and invalid_tid_idx */
spinlock_t invalid_lock;
- struct mm_struct *mm;
};
extern struct xarray hfi1_dev_table;
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
index 24ca17b77b72..f3fb28e3d5d7 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -48,23 +49,11 @@
#include <linux/rculist.h>
#include <linux/mmu_notifier.h>
#include <linux/interval_tree_generic.h>
+#include <linux/sched/mm.h>
#include "mmu_rb.h"
#include "trace.h"
-struct mmu_rb_handler {
- struct mmu_notifier mn;
- struct rb_root_cached root;
- void *ops_arg;
- spinlock_t lock; /* protect the RB tree */
- struct mmu_rb_ops *ops;
- struct mm_struct *mm;
- struct list_head lru_list;
- struct work_struct del_work;
- struct list_head del_list;
- struct workqueue_struct *wq;
-};
-
static unsigned long mmu_node_start(struct mmu_rb_node *);
static unsigned long mmu_node_last(struct mmu_rb_node *);
static int mmu_notifier_range_start(struct mmu_notifier *,
@@ -92,37 +81,36 @@ static unsigned long mmu_node_last(struct mmu_rb_node *node)
return PAGE_ALIGN(node->addr + node->len) - 1;
}
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler)
{
- struct mmu_rb_handler *handlr;
+ struct mmu_rb_handler *h;
int ret;
- handlr = kmalloc(sizeof(*handlr), GFP_KERNEL);
- if (!handlr)
+ h = kmalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
return -ENOMEM;
- handlr->root = RB_ROOT_CACHED;
- handlr->ops = ops;
- handlr->ops_arg = ops_arg;
- INIT_HLIST_NODE(&handlr->mn.hlist);
- spin_lock_init(&handlr->lock);
- handlr->mn.ops = &mn_opts;
- handlr->mm = mm;
- INIT_WORK(&handlr->del_work, handle_remove);
- INIT_LIST_HEAD(&handlr->del_list);
- INIT_LIST_HEAD(&handlr->lru_list);
- handlr->wq = wq;
-
- ret = mmu_notifier_register(&handlr->mn, handlr->mm);
+ h->root = RB_ROOT_CACHED;
+ h->ops = ops;
+ h->ops_arg = ops_arg;
+ INIT_HLIST_NODE(&h->mn.hlist);
+ spin_lock_init(&h->lock);
+ h->mn.ops = &mn_opts;
+ INIT_WORK(&h->del_work, handle_remove);
+ INIT_LIST_HEAD(&h->del_list);
+ INIT_LIST_HEAD(&h->lru_list);
+ h->wq = wq;
+
+ ret = mmu_notifier_register(&h->mn, current->mm);
if (ret) {
- kfree(handlr);
+ kfree(h);
return ret;
}
- *handler = handlr;
+ *handler = h;
return 0;
}
@@ -134,7 +122,7 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
struct list_head del_list;
/* Unregister first so we don't get any more notifications. */
- mmu_notifier_unregister(&handler->mn, handler->mm);
+ mmu_notifier_unregister(&handler->mn, handler->mn.mm);
/*
* Make sure the wq delete handler is finished running. It will not
@@ -166,6 +154,10 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
int ret = 0;
trace_hfi1_mmu_rb_insert(mnode->addr, mnode->len);
+
+ if (current->mm != handler->mn.mm)
+ return -EPERM;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, mnode->addr, mnode->len);
if (node) {
@@ -180,6 +172,7 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
__mmu_int_rb_remove(mnode, &handler->root);
list_del(&mnode->list); /* remove from LRU list */
}
+ mnode->handler = handler;
unlock:
spin_unlock_irqrestore(&handler->lock, flags);
return ret;
@@ -217,6 +210,9 @@ bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler,
unsigned long flags;
bool ret = false;
+ if (current->mm != handler->mn.mm)
+ return ret;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, addr, len);
if (node) {
@@ -239,6 +235,9 @@ void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg)
unsigned long flags;
bool stop = false;
+ if (current->mm != handler->mn.mm)
+ return;
+
INIT_LIST_HEAD(&del_list);
spin_lock_irqsave(&handler->lock, flags);
@@ -272,6 +271,9 @@ void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler,
{
unsigned long flags;
+ if (current->mm != handler->mn.mm)
+ return;
+
/* Validity of handler and node pointers has been checked by caller. */
trace_hfi1_mmu_rb_remove(node->addr, node->len);
spin_lock_irqsave(&handler->lock, flags);
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.h b/drivers/infiniband/hw/hfi1/mmu_rb.h
index f04cec1e99d1..423aacc67e94 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.h
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.h
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -54,6 +55,7 @@ struct mmu_rb_node {
unsigned long len;
unsigned long __last;
struct rb_node node;
+ struct mmu_rb_handler *handler;
struct list_head list;
};
@@ -71,7 +73,19 @@ struct mmu_rb_ops {
void *evict_arg, bool *stop);
};
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+struct mmu_rb_handler {
+ struct mmu_notifier mn;
+ struct rb_root_cached root;
+ void *ops_arg;
+ spinlock_t lock; /* protect the RB tree */
+ struct mmu_rb_ops *ops;
+ struct list_head lru_list;
+ struct work_struct del_work;
+ struct list_head del_list;
+ struct workqueue_struct *wq;
+};
+
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler);
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index f81ca20f4b69..b94fc7fd75a9 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -173,15 +174,18 @@ static void unpin_rcv_pages(struct hfi1_filedata *fd,
{
struct page **pages;
struct hfi1_devdata *dd = fd->uctxt->dd;
+ struct mm_struct *mm;
if (mapped) {
pci_unmap_single(dd->pcidev, node->dma_addr,
node->npages * PAGE_SIZE, PCI_DMA_FROMDEVICE);
pages = &node->pages[idx];
+ mm = mm_from_tid_node(node);
} else {
pages = &tidbuf->pages[idx];
+ mm = current->mm;
}
- hfi1_release_user_pages(fd->mm, pages, npages, mapped);
+ hfi1_release_user_pages(mm, pages, npages, mapped);
fd->tid_n_pinned -= npages;
}
@@ -216,12 +220,12 @@ static int pin_rcv_pages(struct hfi1_filedata *fd, struct tid_user_buf *tidbuf)
* pages, accept the amount pinned so far and program only that.
* User space knows how to deal with partially programmed buffers.
*/
- if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) {
+ if (!hfi1_can_pin_pages(dd, current->mm, fd->tid_n_pinned, npages)) {
kfree(pages);
return -ENOMEM;
}
- pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages);
+ pinned = hfi1_acquire_user_pages(current->mm, vaddr, npages, true, pages);
if (pinned <= 0) {
kfree(pages);
return pinned;
@@ -756,7 +760,7 @@ static int set_rcvarray_entry(struct hfi1_filedata *fd,
if (fd->use_mn) {
ret = mmu_interval_notifier_insert(
- &node->notifier, fd->mm,
+ &node->notifier, current->mm,
tbuf->vaddr + (pageidx * PAGE_SIZE), npages * PAGE_SIZE,
&tid_mn_ops);
if (ret)
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.h b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
index 332abb446861..d45c7b6988d4 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.h
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_EXP_RCV_H
#define _HFI1_USER_EXP_RCV_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -95,4 +96,9 @@ int hfi1_user_exp_rcv_clear(struct hfi1_filedata *fd,
int hfi1_user_exp_rcv_invalid(struct hfi1_filedata *fd,
struct hfi1_tid_info *tinfo);
+static inline struct mm_struct *mm_from_tid_node(struct tid_rb_node *node)
+{
+ return node->notifier.mm;
+}
+
#endif /* _HFI1_USER_EXP_RCV_H */
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index a92346e88628..4a4956f96a7e 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -188,7 +189,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
- pq->mm = fd->mm;
iowait_init(&pq->busy, 0, NULL, NULL, defer_packet_queue,
activate_packet_queue, NULL, NULL);
@@ -230,7 +230,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
cq->nentries = hfi1_sdma_comp_ring_size;
- ret = hfi1_mmu_rb_register(pq, pq->mm, &sdma_rb_ops, dd->pport->hfi1_wq,
+ ret = hfi1_mmu_rb_register(pq, &sdma_rb_ops, dd->pport->hfi1_wq,
&pq->handler);
if (ret) {
dd_dev_err(dd, "Failed to register with MMU %d", ret);
@@ -980,13 +980,13 @@ static int pin_sdma_pages(struct user_sdma_request *req,
npages -= node->npages;
retry:
- if (!hfi1_can_pin_pages(pq->dd, pq->mm,
+ if (!hfi1_can_pin_pages(pq->dd, current->mm,
atomic_read(&pq->n_locked), npages)) {
cleared = sdma_cache_evict(pq, npages);
if (cleared >= npages)
goto retry;
}
- pinned = hfi1_acquire_user_pages(pq->mm,
+ pinned = hfi1_acquire_user_pages(current->mm,
((unsigned long)iovec->iov.iov_base +
(node->npages * PAGE_SIZE)), npages, 0,
pages + node->npages);
@@ -995,7 +995,7 @@ static int pin_sdma_pages(struct user_sdma_request *req,
return pinned;
}
if (pinned != npages) {
- unpin_vector_pages(pq->mm, pages, node->npages, pinned);
+ unpin_vector_pages(current->mm, pages, node->npages, pinned);
return -EFAULT;
}
kfree(node->pages);
@@ -1008,7 +1008,8 @@ static int pin_sdma_pages(struct user_sdma_request *req,
static void unpin_sdma_pages(struct sdma_mmu_node *node)
{
if (node->npages) {
- unpin_vector_pages(node->pq->mm, node->pages, 0, node->npages);
+ unpin_vector_pages(mm_from_sdma_node(node), node->pages, 0,
+ node->npages);
atomic_sub(node->npages, &node->pq->n_locked);
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 9972e0e6545e..1e8c02fe8ad1 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_SDMA_H
#define _HFI1_USER_SDMA_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -133,7 +134,6 @@ struct hfi1_user_sdma_pkt_q {
unsigned long unpinned;
struct mmu_rb_handler *handler;
atomic_t n_locked;
- struct mm_struct *mm;
};
struct hfi1_user_sdma_comp_q {
@@ -250,4 +250,9 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
struct iovec *iovec, unsigned long dim,
unsigned long *count);
+static inline struct mm_struct *mm_from_sdma_node(struct sdma_mmu_node *node)
+{
+ return node->rb.handler->mn.mm;
+}
+
#endif /* _HFI1_USER_SDMA_H */
In the error path of rpcif_manual_xfer() the value of ret is overwritten
by value returned by reset_control_reset() function and thus returning
incorrect value to the caller.
This patch makes sure the correct value is returned to the caller of
rpcif_manual_xfer() by dropping the overwrite of ret in error path.
Also now we ignore the value returned by reset_control_reset() in the
error path and instead print a error message when it fails.
Fixes: ca7d8b980b67f ("memory: add Renesas RPC-IF driver")
Reported-by: Pavel Machek <pavel(a)denx.de>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Sergei Shtylyov <sergei.shtylyov(a)gmail.com>
---
drivers/memory/renesas-rpc-if.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/memory/renesas-rpc-if.c b/drivers/memory/renesas-rpc-if.c
index f2a33a1af836..69f2e2b4cd50 100644
--- a/drivers/memory/renesas-rpc-if.c
+++ b/drivers/memory/renesas-rpc-if.c
@@ -508,7 +508,8 @@ int rpcif_manual_xfer(struct rpcif *rpc)
return ret;
err_out:
- ret = reset_control_reset(rpc->rstc);
+ if (reset_control_reset(rpc->rstc))
+ dev_err(rpc->dev, "Failed to reset HW\n");
rpcif_hw_init(rpc, rpc->bus_size == 2);
goto exit;
}
--
2.25.1
On Sat, Nov 28, 2020 at 05:09:18PM +0800, 刘志旭 wrote:
> I still didn't see this patch in stable queue yet. Since we've a working POC to panic the
> system (see https://bugzilla.kernel.org/show_bug.cgi?id=209823), I think it's necessary
> to merge this patch ASAP, thanks.
Odd, I don't think Sasha pushed out his patch queue. I've applied this
now, thanks.
greg k-h
On Sat, Nov 28, 2020 at 02:47:22PM +0800, Wen Yang wrote:
> [ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]
No, that is not this commit at all.
What are you wanting to have happen here?
confused,
greg k-h
The patch titled
Subject: mm: memcg/slab: fix obj_cgroup_charge() return value handling
has been added to the -mm tree. Its filename is
mm-memcg-slab-fix-obj_cgroup_charge-return-value-handling.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-slab-fix-obj_cgroup_char…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-memcg-slab-fix-obj_cgroup_char…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: memcg/slab: fix obj_cgroup_charge() return value handling
Commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for
all allocations") introduced a regression into the handling of the
obj_cgroup_charge() return value. If a non-zero value is returned
(indicating of exceeding one of memory.max limits), the allocation should
fail, instead of falling back to non-accounted mode.
To make the code more readable, move memcg_slab_pre_alloc_hook() and
memcg_slab_post_alloc_hook() calling conditions into bodies of these
hooks.
Link: https://lkml.kernel.org/r/20201127161828.GD840171@carbon.dhcp.thefacebook.c…
Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.h | 40 ++++++++++++++++++++++++----------------
1 file changed, 24 insertions(+), 16 deletions(-)
--- a/mm/slab.h~mm-memcg-slab-fix-obj_cgroup_charge-return-value-handling
+++ a/mm/slab.h
@@ -274,22 +274,32 @@ static inline size_t obj_full_size(struc
return s->size + sizeof(struct obj_cgroup *);
}
-static inline struct obj_cgroup *memcg_slab_pre_alloc_hook(struct kmem_cache *s,
- size_t objects,
- gfp_t flags)
+/*
+ * Returns false if the allocation should fail.
+ */
+static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct obj_cgroup **objcgp,
+ size_t objects, gfp_t flags)
{
struct obj_cgroup *objcg;
+ if (!memcg_kmem_enabled())
+ return true;
+
+ if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
+ return true;
+
objcg = get_obj_cgroup_from_current();
if (!objcg)
- return NULL;
+ return true;
if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) {
obj_cgroup_put(objcg);
- return NULL;
+ return false;
}
- return objcg;
+ *objcgp = objcg;
+ return true;
}
static inline void mod_objcg_state(struct obj_cgroup *objcg,
@@ -315,7 +325,7 @@ static inline void memcg_slab_post_alloc
unsigned long off;
size_t i;
- if (!objcg)
+ if (!memcg_kmem_enabled() || !objcg)
return;
flags &= ~__GFP_ACCOUNT;
@@ -400,11 +410,11 @@ static inline void memcg_free_page_obj_c
{
}
-static inline struct obj_cgroup *memcg_slab_pre_alloc_hook(struct kmem_cache *s,
- size_t objects,
- gfp_t flags)
+static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct obj_cgroup **objcgp,
+ size_t objects, gfp_t flags)
{
- return NULL;
+ return true;
}
static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
@@ -508,9 +518,8 @@ static inline struct kmem_cache *slab_pr
if (should_failslab(s, flags))
return NULL;
- if (memcg_kmem_enabled() &&
- ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT)))
- *objcgp = memcg_slab_pre_alloc_hook(s, size, flags);
+ if (!memcg_slab_pre_alloc_hook(s, objcgp, size, flags))
+ return NULL;
return s;
}
@@ -529,8 +538,7 @@ static inline void slab_post_alloc_hook(
s->flags, flags);
}
- if (memcg_kmem_enabled())
- memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
+ memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
}
#ifndef CONFIG_SLOB
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-memcg-slab-fix-obj_cgroup_charge-return-value-handling.patch
mm-memcg-fix-obsolete-code-comments.patch
mm-memcg-deprecate-the-non-hierarchical-mode.patch
docs-cgroup-v1-reflect-the-deprecation-of-the-non-hierarchical-mode.patch
cgroup-remove-obsoleted-broken_hierarchy-and-warned_broken_hierarchy.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch
mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
mm-introduce-page-memcg-flags.patch
mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch
mm-slub-call-account_slab_page-after-slab-page-initialization.patch
mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch
mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account-v2.patch
The patch titled
Subject: mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
has been removed from the -mm tree. Its filename was
mm-userfaultfd-do-not-access-vma-vm_mm-after-calling-handle_userfault.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Subject: mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.
In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be freed
in some cases. In the case of userfaultfd_missing(), this will happen
after calling handle_userfault(), which might have released the mmap_lock.
Therefore, the following pte_free(vma->vm_mm, pgtable) will access an
unstable vma->vm_mm, which could have been freed or re-used already.
For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the passed-in
mm. The implementation for SPARC32 would also access mm->page_table_lock
for pte_free(), but there is no THP support in SPARC32, so the buggy code
path will not be used there.
For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used mm
could result in various more or less subtle bugs due to list / pagetable
corruption.
Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.
Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case it
put the pte_free() before calling handle_userfault().
==================================================================
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334
CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
[<00000000aa0a7a1c>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
[<00000000aa0a7a1c>] show_stack+0x174/0x220 arch/s390/kernel/dumpstack.c:135
[<00000000aa105952>] __dump_stack lib/dump_stack.c:77 [inline]
[<00000000aa105952>] dump_stack+0x262/0x2e8 lib/dump_stack.c:118
[<00000000aa0b484e>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:385
[<00000000a61f13aa>] __kasan_report mm/kasan/report.c:545 [inline]
[<00000000a61f13aa>] kasan_report+0x11a/0x168 mm/kasan/report.c:562
[<00000000a620d782>] do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
[<00000000a610632e>] create_huge_pmd mm/memory.c:4256 [inline]
[<00000000a610632e>] __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
[<00000000a61067b0>] handle_mm_fault+0x288/0x748 mm/memory.c:4607
[<00000000a598b55c>] do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
[<00000000a598d7c4>] do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
[<00000000aa124e5e>] pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
[<00000000aa0a6902>] copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
[<00000000aa0a6902>] raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
[<00000000a7c24668>] _copy_from_user+0x48/0xa8 lib/usercopy.c:16
[<00000000a5b0b2a8>] copy_from_user include/linux/uaccess.h:192 [inline]
[<00000000a5b0b2a8>] __do_sys_sigaltstack kernel/signal.c:4064 [inline]
[<00000000a5b0b2a8>] __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
[<00000000aa124a9c>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Allocated by task 9334:
stack_trace_save+0xbe/0xf0 kernel/stacktrace.c:121
kasan_save_stack+0x30/0x60 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc.constprop.0+0xd0/0xe8 mm/kasan/common.c:461
slab_post_alloc_hook mm/slab.h:526 [inline]
slab_alloc_node mm/slub.c:2891 [inline]
slab_alloc mm/slub.c:2899 [inline]
kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
__split_vma+0xba/0x560 mm/mmap.c:2742
split_vma+0xca/0x108 mm/mmap.c:2800
mlock_fixup+0x4ae/0x600 mm/mlock.c:550
apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
do_mlock+0x1aa/0x718 mm/mlock.c:711
__do_sys_mlock2 mm/mlock.c:738 [inline]
__s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Freed by task 9333:
stack_trace_save+0xbe/0xf0 kernel/stacktrace.c:121
kasan_save_stack+0x30/0x60 mm/kasan/common.c:48
kasan_set_track+0x32/0x48 mm/kasan/common.c:56
kasan_set_free_info+0x34/0x50 mm/kasan/generic.c:355
__kasan_slab_free+0x11e/0x190 mm/kasan/common.c:422
slab_free_hook mm/slub.c:1544 [inline]
slab_free_freelist_hook mm/slub.c:1577 [inline]
slab_free mm/slub.c:3142 [inline]
kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
__vma_adjust+0x7b2/0x2508 mm/mmap.c:960
vma_merge+0x87e/0xce0 mm/mmap.c:1209
userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
__fput+0x22c/0x7a8 fs/file_table.c:281
task_work_run+0x200/0x320 kernel/task_work.c:151
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
system_call+0xe6/0x28c arch/s390/kernel/entry.S:416
The buggy address belongs to the object at 00000000962d6948
which belongs to the cache vm_area_struct of size 200
The buggy address is located 64 bytes inside of
200-byte region [00000000962d6948, 00000000962d6a10)
The buggy address belongs to the page:
page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6
flags: 0x3ffff00000000200(slab)
raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
page dumped because: kasan: bad access detected
page->mem_cgroup:0000000096959501
Memory state around the buggy address:
00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.…
Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Signed-off-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Reported-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/mm/huge_memory.c~mm-userfaultfd-do-not-access-vma-vm_mm-after-calling-handle_userfault
+++ a/mm/huge_memory.c
@@ -710,7 +710,6 @@ vm_fault_t do_huge_pmd_anonymous_page(st
transparent_hugepage_use_zero_page()) {
pgtable_t pgtable;
struct page *zero_page;
- bool set;
vm_fault_t ret;
pgtable = pte_alloc_one(vma->vm_mm);
if (unlikely(!pgtable))
@@ -723,25 +722,25 @@ vm_fault_t do_huge_pmd_anonymous_page(st
}
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
ret = 0;
- set = false;
if (pmd_none(*vmf->pmd)) {
ret = check_stable_address_space(vma->vm_mm);
if (ret) {
spin_unlock(vmf->ptl);
+ pte_free(vma->vm_mm, pgtable);
} else if (userfaultfd_missing(vma)) {
spin_unlock(vmf->ptl);
+ pte_free(vma->vm_mm, pgtable);
ret = handle_userfault(vmf, VM_UFFD_MISSING);
VM_BUG_ON(ret & VM_FAULT_FALLBACK);
} else {
set_huge_zero_page(pgtable, vma->vm_mm, vma,
haddr, vmf->pmd, zero_page);
spin_unlock(vmf->ptl);
- set = true;
}
- } else
+ } else {
spin_unlock(vmf->ptl);
- if (!set)
pte_free(vma->vm_mm, pgtable);
+ }
return ret;
}
gfp = alloc_hugepage_direct_gfpmask(vma);
_
Patches currently in -mm which might be from gerald.schaefer(a)linux.ibm.com are
The patch titled
Subject: mm: memcg/slab: fix root memcg vmstats
has been removed from the -mm tree. Its filename was
mm-memcg-slab-fix-root-memcg-vmstats.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: memcg/slab: fix root memcg vmstats
If we reparent the slab objects to the root memcg, when we free the slab
object, we need to update the per-memcg vmstats to keep it correct for the
root memcg. Now this at least affects the vmstat of NR_KERNEL_STACK_KB
for !CONFIG_VMAP_STACK when the thread stack size is smaller than the
PAGE_SIZE.
David said: "I assume that without this fix that the root memcg's
vmstat would always be inflated if we reparented."
Link: https://lkml.kernel.org/r/20201110031015.15715-1-songmuchun@bytedance.com
Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Christopher Lameter <cl(a)linux.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yafang Shao <laoar.shao(a)gmail.com>
Cc: Chris Down <chris(a)chrisdown.name>
Cc: <stable(a)vger.kernel.org> [5.3+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-slab-fix-root-memcg-vmstats
+++ a/mm/memcontrol.c
@@ -867,8 +867,13 @@ void __mod_lruvec_slab_state(void *p, en
rcu_read_lock();
memcg = mem_cgroup_from_obj(p);
- /* Untracked pages have no memcg, no lruvec. Update only the node */
- if (!memcg || memcg == root_mem_cgroup) {
+ /*
+ * Untracked pages have no memcg, no lruvec. Update only the
+ * node. If we reparent the slab objects to the root memcg,
+ * when we free the slab object, we need to update the per-memcg
+ * vmstats to keep it correct for the root memcg.
+ */
+ if (!memcg) {
__mod_node_page_state(pgdat, idx, val);
} else {
lruvec = mem_cgroup_lruvec(memcg, pgdat);
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-memcontrol-remove-unused-mod_memcg_obj_state.patch
mm-memcg-slab-fix-return-child-memcg-objcg-for-root-memcg.patch
mm-memcg-slab-fix-use-after-free-in-obj_cgroup_charge.patch
mm-memcg-slab-rename-_lruvec_slab_state-to-_lruvec_kmem_state.patch
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fa4d30556883f2eaab425b88ba9904865a4d00f3 Mon Sep 17 00:00:00 2001
From: Christian Eggers <ceggers(a)arri.de>
Date: Wed, 7 Oct 2020 10:45:22 +0200
Subject: [PATCH] i2c: imx: Fix reset of I2SR_IAL flag
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to the "VFxxx Controller Reference Manual" (and the comment
block starting at line 97), Vybrid requires writing a one for clearing
an interrupt flag. Syncing the method for clearing I2SR_IIF in
i2c_imx_isr().
Signed-off-by: Christian Eggers <ceggers(a)arri.de>
Fixes: 4b775022f6fd ("i2c: imx: add struct to hold more configurable quirks")
Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wolfram Sang <wsa(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 0ab5381aa012..cbdcab73a055 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -412,6 +412,19 @@ static void i2c_imx_dma_free(struct imx_i2c_struct *i2c_imx)
dma->chan_using = NULL;
}
+static void i2c_imx_clear_irq(struct imx_i2c_struct *i2c_imx, unsigned int bits)
+{
+ unsigned int temp;
+
+ /*
+ * i2sr_clr_opcode is the value to clear all interrupts. Here we want to
+ * clear only <bits>, so we write ~i2sr_clr_opcode with just <bits>
+ * toggled. This is required because i.MX needs W1C and Vybrid uses W0C.
+ */
+ temp = ~i2c_imx->hwdata->i2sr_clr_opcode ^ bits;
+ imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2SR);
+}
+
static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy, bool atomic)
{
unsigned long orig_jiffies = jiffies;
@@ -424,8 +437,7 @@ static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy, bool a
/* check for arbitration lost */
if (temp & I2SR_IAL) {
- temp &= ~I2SR_IAL;
- imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2SR);
+ i2c_imx_clear_irq(i2c_imx, I2SR_IAL);
return -EAGAIN;
}
@@ -623,9 +635,7 @@ static irqreturn_t i2c_imx_isr(int irq, void *dev_id)
if (temp & I2SR_IIF) {
/* save status register */
i2c_imx->i2csr = temp;
- temp &= ~I2SR_IIF;
- temp |= (i2c_imx->hwdata->i2sr_clr_opcode & I2SR_IIF);
- imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2SR);
+ i2c_imx_clear_irq(i2c_imx, I2SR_IIF);
wake_up(&i2c_imx->queue);
return IRQ_HANDLED;
}
In some cases we have the handle those explicitly as the fallback
connector type detection fails and marks those as eDP connectors.
Attempting to use such a connector with mutter leads to a crash of mutter
as it ends up with two eDP displays.
Information is taken from the official DCB documentation.
Cc: stable(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: Ben Skeggs <bskeggs(a)redhat.com>
Reported-by: Mark Pearson <markpearson(a)lenovo.com>
Tested-by: Mark Pearson <markpearson(a)lenovo.com>
Signed-off-by: Karol Herbst <kherbst(a)redhat.com>
---
drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/conn.h | 1 +
drivers/gpu/drm/nouveau/nouveau_connector.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/conn.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/conn.h
index f5f59261ea81..d1beaad0c82b 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/conn.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/bios/conn.h
@@ -14,6 +14,7 @@ enum dcb_connector_type {
DCB_CONNECTOR_LVDS_SPWG = 0x41,
DCB_CONNECTOR_DP = 0x46,
DCB_CONNECTOR_eDP = 0x47,
+ DCB_CONNECTOR_mDP = 0x48,
DCB_CONNECTOR_HDMI_0 = 0x60,
DCB_CONNECTOR_HDMI_1 = 0x61,
DCB_CONNECTOR_HDMI_C = 0x63,
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 8b4b3688c7ae..4c992fd5bd68 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -1210,6 +1210,7 @@ drm_conntype_from_dcb(enum dcb_connector_type dcb)
case DCB_CONNECTOR_DMS59_DP0:
case DCB_CONNECTOR_DMS59_DP1:
case DCB_CONNECTOR_DP :
+ case DCB_CONNECTOR_mDP :
case DCB_CONNECTOR_USB_C : return DRM_MODE_CONNECTOR_DisplayPort;
case DCB_CONNECTOR_eDP : return DRM_MODE_CONNECTOR_eDP;
case DCB_CONNECTOR_HDMI_0 :
--
2.28.0
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 25bc65d8ddfc17cc1d7a45bd48e9bdc0e729ced3
Gitweb: https://git.kernel.org/tip/25bc65d8ddfc17cc1d7a45bd48e9bdc0e729ced3
Author: Gabriele Paoloni <gabriele.paoloni(a)intel.com>
AuthorDate: Fri, 27 Nov 2020 16:18:15
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Fri, 27 Nov 2020 17:38:36 +01:00
x86/mce: Do not overwrite no_way_out if mce_end() fails
Currently, if mce_end() fails, no_way_out - the variable denoting
whether the machine can recover from this MCE - is determined by whether
the worst severity that was found across the MCA banks associated with
the current CPU, is of panic severity.
However, at this point no_way_out could have been already set by
mca_start() after looking at all severities of all CPUs that entered the
MCE handler. If mce_end() fails, check first if no_way_out is already
set and, if so, stick to it, otherwise use the local worst value.
[ bp: Massage. ]
Signed-off-by: Gabriele Paoloni <gabriele.paoloni(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20201127161819.3106432-2-gabriele.paoloni@intel.c…
---
arch/x86/kernel/cpu/mce/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4102b86..32b7099 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1384,8 +1384,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
* When there's any problem use only local no_way_out state.
*/
if (!lmce) {
- if (mce_end(order) < 0)
- no_way_out = worst >= MCE_PANIC_SEVERITY;
+ if (mce_end(order) < 0) {
+ if (!no_way_out)
+ no_way_out = worst >= MCE_PANIC_SEVERITY;
+ }
} else {
/*
* If there was a fatal machine check we should have
Hi Greg, Sasha,
These two were missing in 4.9-stable. Please apply to your queue.
80e46cf22ba0 ("btrfs: tree-checker: Enhance chunk checker to validate chunk profile")
6bf9e4bd6a27 ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference")
--
Regards
Sudip
Hi Greg, Sasha,
These were missing in 4.14-stable. Please apply to your queue.
80e46cf22ba0 ("btrfs: tree-checker: Enhance chunk checker to validate chunk profile")
005d67127fa9 ("btrfs: adjust return values of btrfs_inode_by_name")
6bf9e4bd6a27 ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference")
The second one in only needed to make the backporting of third easier.
--
Regards
Sudip
On Fri, Nov 27, 2020 at 02:55:47AM +0000, Peter Chen wrote:
> On 20-11-26 19:09:36, Greg Kroah-Hartman wrote:
> > From: "taehyun.cho" <taehyun.cho(a)samsung.com>
> >
> > Setup the descriptors for SuperSpeed Plus for f_fs. This allows the
> > gadget to work properly without crashing at SuperSpeed rates.
> >
> > Cc: Felipe Balbi <balbi(a)kernel.org>
> > Cc: Peter Chen <peter.chen(a)nxp.com>
> > Cc: stable <stable(a)vger.kernel.org>
> > Signed-off-by: taehyun.cho <taehyun.cho(a)samsung.com>
> > Signed-off-by: Will McVicker <willmcvicker(a)google.com>
> > Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> > ---
> > drivers/usb/gadget/function/f_fs.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
> > index 046f770a76da..a34a7c96a1ab 100644
> > --- a/drivers/usb/gadget/function/f_fs.c
> > +++ b/drivers/usb/gadget/function/f_fs.c
> > @@ -1327,6 +1327,7 @@ static long ffs_epfile_ioctl(struct file *file, unsigned code,
> > struct usb_endpoint_descriptor *desc;
> >
> > switch (epfile->ffs->gadget->speed) {
> > + case USB_SPEED_SUPER_PLUS:
> > case USB_SPEED_SUPER:
> > desc_idx = 2;
> > break;
> > @@ -3222,6 +3223,10 @@ static int _ffs_func_bind(struct usb_configuration *c,
> > func->function.os_desc_n =
> > c->cdev->use_os_string ? ffs->interfaces_count : 0;
> >
> > + if (likely(super)) {
>
> Why likely is used? Currently, there are still lots of HS devices on market
> or on the development.
It looks to be a cut/paste of the other tests above, all of which say
"likely" which we all know is not true at all. I'll leave this now, and
add a patch that removes them all as this is NOT a function where it
should be used at all.
thanks for the review.
greg k-h
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2194bc7c39610be7cabe7456c5f63a570604f015 Mon Sep 17 00:00:00 2001
From: Rajat Jain <rajatja(a)google.com>
Date: Mon, 6 Jul 2020 16:32:40 -0700
Subject: [PATCH] PCI: Add device even if driver attach failed
device_attach() returning failure indicates a driver error while trying to
probe the device. In such a scenario, the PCI device should still be added
in the system and be visible to the user.
When device_attach() fails, merely warn about it and keep the PCI device in
the system.
This partially reverts ab1a187bba5c ("PCI: Check device_attach() return
value always").
Link: https://lore.kernel.org/r/20200706233240.3245512-1-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja(a)google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org # v4.6+
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 8e40b3e6da77..3cef835b375f 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -322,12 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
dev->match_driver = true;
retval = device_attach(&dev->dev);
- if (retval < 0 && retval != -EPROBE_DEFER) {
+ if (retval < 0 && retval != -EPROBE_DEFER)
pci_warn(dev, "device attach failed (%d)\n", retval);
- pci_proc_detach_device(dev);
- pci_remove_sysfs_dev_files(dev);
- return;
- }
pci_dev_assign_added(dev, true);
}
Hi Greg
Can you add
commit 35425bafc772ee189e3c3790d7c672b80ba65909
Author: Biwen Li <biwen.li(a)nxp.com>
Date: Tue Sep 15 15:32:09 2020 +0800
rtc: pcf2127: fix a bug when not specify interrupts property
to the 5.9 stable queue? It fixes
commit 27006416be16b7887fb94b3b445f32453defb3f1
Author: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Date: Wed Aug 12 10:51:14 2020 +0200
rtc: pcf2127: fix alarm handling
which is only in 5.9.
Thanks,
Rasmus
Hi,
Please backport "wireless: Use linux/stddef.h instead of stddef.h" to
kernel 4.14, 4.19 and 5.4.
This is upstream commit id 1b9ae0c92925ac40489be526d67d0010d0724ce0
https://git.kernel.org/linus/1b9ae0c92925ac40489be526d67d0010d0724ce0
commit 1b9ae0c92925ac40489be526d67d0010d0724ce0
Author: Hauke Mehrtens <hauke(a)hauke-m.de>
Date: Thu May 21 22:14:22 2020 +0200
wireless: Use linux/stddef.h instead of stddef.h
This patch fixes a build problem in broken build environments which was
introduced with 6989310f5d43 ("wireless: Use offsetof instead of custom
macro.") which was backported to the listed kernel versions.
When the include path is fully correct you should not hit this problem,
but I got it because of some bug in by build system and also someone
else reported a similar problem to me and requested this backport.
Hauke
Centralize handling of interrupts from the userspace APIC
in kvm_cpu_has_extint and kvm_cpu_get_extint, since
userspace APIC interrupts are handled more or less the
same as ExtINTs are with split irqchip. This removes
duplicated code from kvm_cpu_has_injectable_intr and
kvm_cpu_has_interrupt, and makes the code more similar
between kvm_cpu_has_{extint,interrupt} on one side
and kvm_cpu_get_{extint,interrupt} on the other.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/irq.c | 85 ++++++++++++++++++--------------------------
arch/x86/kvm/lapic.c | 2 +-
2 files changed, 35 insertions(+), 52 deletions(-)
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 99d118ffc67d..e2d49a506e7f 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -42,15 +42,27 @@ static int pending_userspace_extint(struct kvm_vcpu *v)
*/
static int kvm_cpu_has_extint(struct kvm_vcpu *v)
{
- u8 accept = kvm_apic_accept_pic_intr(v);
+ /*
+ * FIXME: interrupt.injected represents an interrupt that it's
+ * side-effects have already been applied (e.g. bit from IRR
+ * already moved to ISR). Therefore, it is incorrect to rely
+ * on interrupt.injected to know if there is a pending
+ * interrupt in the user-mode LAPIC.
+ * This leads to nVMX/nSVM not be able to distinguish
+ * if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on
+ * pending interrupt or should re-inject an injected
+ * interrupt.
+ */
+ if (!lapic_in_kernel(v))
+ return v->arch.interrupt.injected;
- if (accept) {
- if (irqchip_split(v->kvm))
- return pending_userspace_extint(v);
- else
- return v->kvm->arch.vpic->output;
- } else
+ if (!kvm_apic_accept_pic_intr(v))
return 0;
+
+ if (irqchip_split(v->kvm))
+ return pending_userspace_extint(v);
+ else
+ return v->kvm->arch.vpic->output;
}
/*
@@ -61,20 +73,6 @@ static int kvm_cpu_has_extint(struct kvm_vcpu *v)
*/
int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
{
- /*
- * FIXME: interrupt.injected represents an interrupt that it's
- * side-effects have already been applied (e.g. bit from IRR
- * already moved to ISR). Therefore, it is incorrect to rely
- * on interrupt.injected to know if there is a pending
- * interrupt in the user-mode LAPIC.
- * This leads to nVMX/nSVM not be able to distinguish
- * if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on
- * pending interrupt or should re-inject an injected
- * interrupt.
- */
- if (!lapic_in_kernel(v))
- return v->arch.interrupt.injected;
-
if (kvm_cpu_has_extint(v))
return 1;
@@ -91,20 +89,6 @@ EXPORT_SYMBOL_GPL(kvm_cpu_has_injectable_intr);
*/
int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
{
- /*
- * FIXME: interrupt.injected represents an interrupt that it's
- * side-effects have already been applied (e.g. bit from IRR
- * already moved to ISR). Therefore, it is incorrect to rely
- * on interrupt.injected to know if there is a pending
- * interrupt in the user-mode LAPIC.
- * This leads to nVMX/nSVM not be able to distinguish
- * if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on
- * pending interrupt or should re-inject an injected
- * interrupt.
- */
- if (!lapic_in_kernel(v))
- return v->arch.interrupt.injected;
-
if (kvm_cpu_has_extint(v))
return 1;
@@ -118,16 +102,21 @@ EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
*/
static int kvm_cpu_get_extint(struct kvm_vcpu *v)
{
- if (kvm_cpu_has_extint(v)) {
- if (irqchip_split(v->kvm)) {
- int vector = v->arch.pending_external_vector;
-
- v->arch.pending_external_vector = -1;
- return vector;
- } else
- return kvm_pic_read_irq(v->kvm); /* PIC */
- } else
+ if (!kvm_cpu_has_extint(v)) {
+ WARN_ON(!lapic_in_kernel(v));
return -1;
+ }
+
+ if (!lapic_in_kernel(v))
+ return v->arch.interrupt.nr;
+
+ if (irqchip_split(v->kvm)) {
+ int vector = v->arch.pending_external_vector;
+
+ v->arch.pending_external_vector = -1;
+ return vector;
+ } else
+ return kvm_pic_read_irq(v->kvm); /* PIC */
}
/*
@@ -135,13 +124,7 @@ static int kvm_cpu_get_extint(struct kvm_vcpu *v)
*/
int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
{
- int vector;
-
- if (!lapic_in_kernel(v))
- return v->arch.interrupt.nr;
-
- vector = kvm_cpu_get_extint(v);
-
+ int vector = kvm_cpu_get_extint(v);
if (vector != -1)
return vector; /* PIC */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 105e7859d1f2..bb5ff761d5e2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2465,7 +2465,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
u32 ppr;
- if (!kvm_apic_hw_enabled(apic))
+ if (!kvm_apic_present(vcpu))
return -1;
__apic_update_ppr(apic, &ppr);
--
2.28.0
Hi Greg, Sasha,
These two were missing in 4.4-stable. Please apply to your queue.
80e46cf22ba0 ("btrfs: tree-checker: Enhance chunk checker to validate chunk profile")
6bf9e4bd6a27 ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference")
--
Regards
Sudip
From: Zenghui Yu <yuzenghui(a)huawei.com>
It was recently reported that if GICR_TYPER is accessed before the RD base
address is set, we'll suffer from the unset @rdreg dereferencing. Oops...
gpa_t last_rdist_typer = rdreg->base + GICR_TYPER +
(rdreg->free_index - 1) * KVM_VGIC_V3_REDIST_SIZE;
It's "expected" that users will access registers in the redistributor if
the RD has been properly configured (e.g., the RD base address is set). But
it hasn't yet been covered by the existing documentation.
Per discussion on the list [1], the reporting of the GICR_TYPER.Last bit
for userspace never actually worked. And it's difficult for us to emulate
it correctly given that userspace has the flexibility to access it any
time. Let's just drop the reporting of the Last bit for userspace for now
(userspace should have full knowledge about it anyway) and it at least
prevents kernel from panic ;-)
[1] https://lore.kernel.org/kvmarm/c20865a267e44d1e2c0d52ce4e012263@kernel.org/
Fixes: ba7b3f1275fd ("KVM: arm/arm64: Revisit Redistributor TYPER last bit computation")
Reported-by: Keqian Zhu <zhukeqian1(a)huawei.com>
Signed-off-by: Zenghui Yu <yuzenghui(a)huawei.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Eric Auger <eric.auger(a)redhat.com>
Link: https://lore.kernel.org/r/20201117151629.1738-1-yuzenghui@huawei.com
Cc: stable(a)vger.kernel.org
---
arch/arm64/kvm/vgic/vgic-mmio-v3.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 52d6f24f65dc..15a6c98ee92f 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -273,6 +273,23 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
return extract_bytes(value, addr & 7, len);
}
+static unsigned long vgic_uaccess_read_v3r_typer(struct kvm_vcpu *vcpu,
+ gpa_t addr, unsigned int len)
+{
+ unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
+ int target_vcpu_id = vcpu->vcpu_id;
+ u64 value;
+
+ value = (u64)(mpidr & GENMASK(23, 0)) << 32;
+ value |= ((target_vcpu_id & 0xffff) << 8);
+
+ if (vgic_has_its(vcpu->kvm))
+ value |= GICR_TYPER_PLPIS;
+
+ /* reporting of the Last bit is not supported for userspace */
+ return extract_bytes(value, addr & 7, len);
+}
+
static unsigned long vgic_mmio_read_v3r_iidr(struct kvm_vcpu *vcpu,
gpa_t addr, unsigned int len)
{
@@ -593,8 +610,9 @@ static const struct vgic_register_region vgic_v3_rd_registers[] = {
REGISTER_DESC_WITH_LENGTH(GICR_IIDR,
vgic_mmio_read_v3r_iidr, vgic_mmio_write_wi, 4,
VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICR_TYPER,
- vgic_mmio_read_v3r_typer, vgic_mmio_write_wi, 8,
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_TYPER,
+ vgic_mmio_read_v3r_typer, vgic_mmio_write_wi,
+ vgic_uaccess_read_v3r_typer, vgic_mmio_uaccess_write_wi, 8,
VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICR_WAKER,
vgic_mmio_read_raz, vgic_mmio_write_wi, 4,
--
2.28.0
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: msi2500: assign SPI bus number dynamically
Author: Antti Palosaari <crope(a)iki.fi>
Date: Sat Aug 17 03:12:10 2019 +0200
SPI bus number must be assigned dynamically for each device, otherwise it
will crash when multiple devices are plugged to system.
Reported-and-tested-by: syzbot+c60ddb60b685777d9d59(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Antti Palosaari <crope(a)iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/usb/msi2500/msi2500.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
diff --git a/drivers/media/usb/msi2500/msi2500.c b/drivers/media/usb/msi2500/msi2500.c
index 269f3ef34bc9..63882a5248ae 100644
--- a/drivers/media/usb/msi2500/msi2500.c
+++ b/drivers/media/usb/msi2500/msi2500.c
@@ -1230,7 +1230,7 @@ static int msi2500_probe(struct usb_interface *intf,
}
dev->master = master;
- master->bus_num = 0;
+ master->bus_num = -1;
master->num_chipselect = 1;
master->transfer_one_message = msi2500_transfer_one_message;
spi_master_set_devdata(master, dev);
From: Krzysztof Kozlowski <krzk(a)kernel.org>
GPIOs - as returned by of_get_named_gpio() and used by the gpiolib - are
signed integers, where negative number indicates error. The return
value of of_get_named_gpio() should not be assigned to an unsigned int
because in case of !CONFIG_GPIOLIB such number would be a valid GPIO.
Fixes: c04c674fadeb ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
---
drivers/nfc/s3fwrn5/i2c.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c
index 0ffa389..ae26594 100644
--- a/drivers/nfc/s3fwrn5/i2c.c
+++ b/drivers/nfc/s3fwrn5/i2c.c
@@ -25,8 +25,8 @@ struct s3fwrn5_i2c_phy {
struct i2c_client *i2c_dev;
struct nci_dev *ndev;
- unsigned int gpio_en;
- unsigned int gpio_fw_wake;
+ int gpio_en;
+ int gpio_fw_wake;
struct mutex mutex;
--
1.9.1
With 5.9 kernel on ARM64, I found ftrace_dump output was broken but
it had no problem with normal output "cat /sys/kernel/debug/tracing/trace".
With investigation, it seems coping the data into temporal buffer seems to
break the align binary printf expects if the static buffer is not aligned
with 4-byte. IIUC, get_arg in bstr_printf expects that args has already
right align to be decoded and seq_buf_bprintf says ``the arguments are saved
in a 32bit word array that is defined by the format string constraints``.
So if we don't keep the align under copy to temporal buffer, the output
will be broken by shifting some bytes.
This patch fixes it.
Cc: <stable(a)vger.kernel.org>
Fixes: 8e99cf91b99bb ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Signed-off-by: Minchan Kim <minchan(a)kernel.org>
---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6a282bbc7e7f..01bfcc345d55 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3534,7 +3534,7 @@ __find_next_entry(struct trace_iterator *iter, int *ent_cpu,
}
#define STATIC_TEMP_BUF_SIZE 128
-static char static_temp_buf[STATIC_TEMP_BUF_SIZE];
+static char static_temp_buf[STATIC_TEMP_BUF_SIZE] __aligned(4);
/* Find the next real entry, without updating the iterator itself */
struct trace_entry *trace_find_next_entry(struct trace_iterator *iter,
--
2.29.2.454.gaff20da3a2-goog
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Note the check in the unregister is not needed in the event that
current->mm is empty. This means the tear down is happening due to a
SigKill or OOM Killer, something along those lines. If current->mm has a
value then it must be checked and only the task that did the register
can do the unregister.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it
was explicit. The mmu notifier deregistration happens when the user
context is torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will
be addressed in a follow on patch.
Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Jann Horn <jannh(a)google.com>
Reported-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
---
Changes since v0:
----------------
Removed the checking of the pid and limitation that
whatever task opens the dev is the only one that can do write() or
ioctl(). While this limitation is OK it doesn't appear to be strictly
necessary.
Rebased on top of 5.10-rc1. Testing has been done on 5.9 due to a bug in
5.10 that is being worked (separate issue).
Changes since v1:
----------------
Remove explicit mmget/put to rely on the notifier register's mmgrab
instead.
Fixed missing check in rb_unregister to only check current->mm if its
actually valid.
Moved mm_from_tid_node to exp_rcv header and use it
Changes since v2:
----------------
Change Reported-by to Suggested-by for Jann
Commit msg updates
Remove private mm pointer and use notifier's
Changes since v3:
-----------------
Added Ira's RB and Cc stable list
Updated commit message
Added comment to mmu_rb_unregister
Renamed confusing variable in mmu_rb_register
Changes since v4:
-----------------
Remove conditional mmu notifier registration and fix up commit msg.
---
drivers/infiniband/hw/hfi1/file_ops.c | 4 --
drivers/infiniband/hw/hfi1/hfi.h | 2 -
drivers/infiniband/hw/hfi1/mmu_rb.c | 68 +++++++++++++++--------------
drivers/infiniband/hw/hfi1/mmu_rb.h | 16 ++++++-
drivers/infiniband/hw/hfi1/user_exp_rcv.c | 12 +++--
drivers/infiniband/hw/hfi1/user_exp_rcv.h | 6 +++
drivers/infiniband/hw/hfi1/user_sdma.c | 13 +++---
drivers/infiniband/hw/hfi1/user_sdma.h | 7 +++
8 files changed, 79 insertions(+), 49 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 8ca51e4..329ee4f 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -206,8 +207,6 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
spin_lock_init(&fd->tid_lock);
spin_lock_init(&fd->invalid_lock);
fd->rec_cpu_num = -1; /* no cpu affinity by default */
- fd->mm = current->mm;
- mmgrab(fd->mm);
fd->dd = dd;
fp->private_data = fd;
return 0;
@@ -711,7 +710,6 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
deallocate_ctxt(uctxt);
done:
- mmdrop(fdata->mm);
if (atomic_dec_and_test(&dd->user_refcount))
complete(&dd->user_comp);
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index b4c6bff..e09e824 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_KERNEL_H
#define _HFI1_KERNEL_H
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -1451,7 +1452,6 @@ struct hfi1_filedata {
u32 invalid_tid_idx;
/* protect invalid_tids array and invalid_tid_idx */
spinlock_t invalid_lock;
- struct mm_struct *mm;
};
extern struct xarray hfi1_dev_table;
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
index 24ca17b..f3fb28e 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -48,23 +49,11 @@
#include <linux/rculist.h>
#include <linux/mmu_notifier.h>
#include <linux/interval_tree_generic.h>
+#include <linux/sched/mm.h>
#include "mmu_rb.h"
#include "trace.h"
-struct mmu_rb_handler {
- struct mmu_notifier mn;
- struct rb_root_cached root;
- void *ops_arg;
- spinlock_t lock; /* protect the RB tree */
- struct mmu_rb_ops *ops;
- struct mm_struct *mm;
- struct list_head lru_list;
- struct work_struct del_work;
- struct list_head del_list;
- struct workqueue_struct *wq;
-};
-
static unsigned long mmu_node_start(struct mmu_rb_node *);
static unsigned long mmu_node_last(struct mmu_rb_node *);
static int mmu_notifier_range_start(struct mmu_notifier *,
@@ -92,37 +81,36 @@ static unsigned long mmu_node_last(struct mmu_rb_node *node)
return PAGE_ALIGN(node->addr + node->len) - 1;
}
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler)
{
- struct mmu_rb_handler *handlr;
+ struct mmu_rb_handler *h;
int ret;
- handlr = kmalloc(sizeof(*handlr), GFP_KERNEL);
- if (!handlr)
+ h = kmalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
return -ENOMEM;
- handlr->root = RB_ROOT_CACHED;
- handlr->ops = ops;
- handlr->ops_arg = ops_arg;
- INIT_HLIST_NODE(&handlr->mn.hlist);
- spin_lock_init(&handlr->lock);
- handlr->mn.ops = &mn_opts;
- handlr->mm = mm;
- INIT_WORK(&handlr->del_work, handle_remove);
- INIT_LIST_HEAD(&handlr->del_list);
- INIT_LIST_HEAD(&handlr->lru_list);
- handlr->wq = wq;
-
- ret = mmu_notifier_register(&handlr->mn, handlr->mm);
+ h->root = RB_ROOT_CACHED;
+ h->ops = ops;
+ h->ops_arg = ops_arg;
+ INIT_HLIST_NODE(&h->mn.hlist);
+ spin_lock_init(&h->lock);
+ h->mn.ops = &mn_opts;
+ INIT_WORK(&h->del_work, handle_remove);
+ INIT_LIST_HEAD(&h->del_list);
+ INIT_LIST_HEAD(&h->lru_list);
+ h->wq = wq;
+
+ ret = mmu_notifier_register(&h->mn, current->mm);
if (ret) {
- kfree(handlr);
+ kfree(h);
return ret;
}
- *handler = handlr;
+ *handler = h;
return 0;
}
@@ -134,7 +122,7 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
struct list_head del_list;
/* Unregister first so we don't get any more notifications. */
- mmu_notifier_unregister(&handler->mn, handler->mm);
+ mmu_notifier_unregister(&handler->mn, handler->mn.mm);
/*
* Make sure the wq delete handler is finished running. It will not
@@ -166,6 +154,10 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
int ret = 0;
trace_hfi1_mmu_rb_insert(mnode->addr, mnode->len);
+
+ if (current->mm != handler->mn.mm)
+ return -EPERM;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, mnode->addr, mnode->len);
if (node) {
@@ -180,6 +172,7 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
__mmu_int_rb_remove(mnode, &handler->root);
list_del(&mnode->list); /* remove from LRU list */
}
+ mnode->handler = handler;
unlock:
spin_unlock_irqrestore(&handler->lock, flags);
return ret;
@@ -217,6 +210,9 @@ bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler,
unsigned long flags;
bool ret = false;
+ if (current->mm != handler->mn.mm)
+ return ret;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, addr, len);
if (node) {
@@ -239,6 +235,9 @@ void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg)
unsigned long flags;
bool stop = false;
+ if (current->mm != handler->mn.mm)
+ return;
+
INIT_LIST_HEAD(&del_list);
spin_lock_irqsave(&handler->lock, flags);
@@ -272,6 +271,9 @@ void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler,
{
unsigned long flags;
+ if (current->mm != handler->mn.mm)
+ return;
+
/* Validity of handler and node pointers has been checked by caller. */
trace_hfi1_mmu_rb_remove(node->addr, node->len);
spin_lock_irqsave(&handler->lock, flags);
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.h b/drivers/infiniband/hw/hfi1/mmu_rb.h
index f04cec1..423aacc 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.h
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.h
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -54,6 +55,7 @@ struct mmu_rb_node {
unsigned long len;
unsigned long __last;
struct rb_node node;
+ struct mmu_rb_handler *handler;
struct list_head list;
};
@@ -71,7 +73,19 @@ struct mmu_rb_ops {
void *evict_arg, bool *stop);
};
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+struct mmu_rb_handler {
+ struct mmu_notifier mn;
+ struct rb_root_cached root;
+ void *ops_arg;
+ spinlock_t lock; /* protect the RB tree */
+ struct mmu_rb_ops *ops;
+ struct list_head lru_list;
+ struct work_struct del_work;
+ struct list_head del_list;
+ struct workqueue_struct *wq;
+};
+
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler);
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index f81ca20..b94fc7f 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -173,15 +174,18 @@ static void unpin_rcv_pages(struct hfi1_filedata *fd,
{
struct page **pages;
struct hfi1_devdata *dd = fd->uctxt->dd;
+ struct mm_struct *mm;
if (mapped) {
pci_unmap_single(dd->pcidev, node->dma_addr,
node->npages * PAGE_SIZE, PCI_DMA_FROMDEVICE);
pages = &node->pages[idx];
+ mm = mm_from_tid_node(node);
} else {
pages = &tidbuf->pages[idx];
+ mm = current->mm;
}
- hfi1_release_user_pages(fd->mm, pages, npages, mapped);
+ hfi1_release_user_pages(mm, pages, npages, mapped);
fd->tid_n_pinned -= npages;
}
@@ -216,12 +220,12 @@ static int pin_rcv_pages(struct hfi1_filedata *fd, struct tid_user_buf *tidbuf)
* pages, accept the amount pinned so far and program only that.
* User space knows how to deal with partially programmed buffers.
*/
- if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) {
+ if (!hfi1_can_pin_pages(dd, current->mm, fd->tid_n_pinned, npages)) {
kfree(pages);
return -ENOMEM;
}
- pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages);
+ pinned = hfi1_acquire_user_pages(current->mm, vaddr, npages, true, pages);
if (pinned <= 0) {
kfree(pages);
return pinned;
@@ -756,7 +760,7 @@ static int set_rcvarray_entry(struct hfi1_filedata *fd,
if (fd->use_mn) {
ret = mmu_interval_notifier_insert(
- &node->notifier, fd->mm,
+ &node->notifier, current->mm,
tbuf->vaddr + (pageidx * PAGE_SIZE), npages * PAGE_SIZE,
&tid_mn_ops);
if (ret)
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.h b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
index 332abb4..d45c7b6 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.h
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_EXP_RCV_H
#define _HFI1_USER_EXP_RCV_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -95,4 +96,9 @@ int hfi1_user_exp_rcv_clear(struct hfi1_filedata *fd,
int hfi1_user_exp_rcv_invalid(struct hfi1_filedata *fd,
struct hfi1_tid_info *tinfo);
+static inline struct mm_struct *mm_from_tid_node(struct tid_rb_node *node)
+{
+ return node->notifier.mm;
+}
+
#endif /* _HFI1_USER_EXP_RCV_H */
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index a92346e..4a4956f9 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -188,7 +189,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
- pq->mm = fd->mm;
iowait_init(&pq->busy, 0, NULL, NULL, defer_packet_queue,
activate_packet_queue, NULL, NULL);
@@ -230,7 +230,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
cq->nentries = hfi1_sdma_comp_ring_size;
- ret = hfi1_mmu_rb_register(pq, pq->mm, &sdma_rb_ops, dd->pport->hfi1_wq,
+ ret = hfi1_mmu_rb_register(pq, &sdma_rb_ops, dd->pport->hfi1_wq,
&pq->handler);
if (ret) {
dd_dev_err(dd, "Failed to register with MMU %d", ret);
@@ -980,13 +980,13 @@ static int pin_sdma_pages(struct user_sdma_request *req,
npages -= node->npages;
retry:
- if (!hfi1_can_pin_pages(pq->dd, pq->mm,
+ if (!hfi1_can_pin_pages(pq->dd, current->mm,
atomic_read(&pq->n_locked), npages)) {
cleared = sdma_cache_evict(pq, npages);
if (cleared >= npages)
goto retry;
}
- pinned = hfi1_acquire_user_pages(pq->mm,
+ pinned = hfi1_acquire_user_pages(current->mm,
((unsigned long)iovec->iov.iov_base +
(node->npages * PAGE_SIZE)), npages, 0,
pages + node->npages);
@@ -995,7 +995,7 @@ static int pin_sdma_pages(struct user_sdma_request *req,
return pinned;
}
if (pinned != npages) {
- unpin_vector_pages(pq->mm, pages, node->npages, pinned);
+ unpin_vector_pages(current->mm, pages, node->npages, pinned);
return -EFAULT;
}
kfree(node->pages);
@@ -1008,7 +1008,8 @@ static int pin_sdma_pages(struct user_sdma_request *req,
static void unpin_sdma_pages(struct sdma_mmu_node *node)
{
if (node->npages) {
- unpin_vector_pages(node->pq->mm, node->pages, 0, node->npages);
+ unpin_vector_pages(mm_from_sdma_node(node), node->pages, 0,
+ node->npages);
atomic_sub(node->npages, &node->pq->n_locked);
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 9972e0e..1e8c02f 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_SDMA_H
#define _HFI1_USER_SDMA_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -133,7 +134,6 @@ struct hfi1_user_sdma_pkt_q {
unsigned long unpinned;
struct mmu_rb_handler *handler;
atomic_t n_locked;
- struct mm_struct *mm;
};
struct hfi1_user_sdma_comp_q {
@@ -250,4 +250,9 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
struct iovec *iovec, unsigned long dim,
unsigned long *count);
+static inline struct mm_struct *mm_from_sdma_node(struct sdma_mmu_node *node)
+{
+ return node->rb.handler->mn.mm;
+}
+
#endif /* _HFI1_USER_SDMA_H */
The postclose handler can run after the device has been removed (or the
driver has been unbound) since userspace clients are free to hold the
file open as long as they want. Because the device removal callback
frees the entire nouveau_drm structure, any reference to it in the
postclose handler will result in a use-after-free.
To reproduce this, one must simply open the device file, unbind the
driver (or physically remove the device), and then close the device
file. This was found and can be reproduced easily with the IGT
core_hotunplug tests.
To avoid this, all clients are cleaned up in the device finalization
rather than deferring it to the postclose handler, and the postclose
handler is protected by a critical section which ensures the
drm_dev_unplug() and the postclose handler won't race.
This is not an ideal fix, since as I understand the proposed plan for
the kernel<->userspace interface for hotplug support, destroying the
client before the file is closed will cause problems. However, I believe
to properly fix this issue, the lifetime of the nouveau_drm structure
needs to be extended to match the drm_device, and this proved to be a
rather invasive change. Thus, I've broken this out so the fix can be
easily backported.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeremy Cline <jcline(a)redhat.com>
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 30 +++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 6ee1adc9bd40..afaf1774ee35 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -628,6 +628,7 @@ nouveau_drm_device_init(struct drm_device *dev)
static void
nouveau_drm_device_fini(struct drm_device *dev)
{
+ struct nouveau_cli *cli, *temp_cli;
struct nouveau_drm *drm = nouveau_drm(dev);
if (nouveau_pmops_runtime()) {
@@ -652,6 +653,24 @@ nouveau_drm_device_fini(struct drm_device *dev)
nouveau_ttm_fini(drm);
nouveau_vga_fini(drm);
+ /*
+ * There may be existing clients from as-yet unclosed files. For now,
+ * clean them up here rather than deferring until the file is closed,
+ * but this likely not correct if we want to support hot-unplugging
+ * properly.
+ */
+ mutex_lock(&drm->clients_lock);
+ list_for_each_entry_safe(cli, temp_cli, &drm->clients, head) {
+ list_del(&cli->head);
+ mutex_lock(&cli->mutex);
+ if (cli->abi16)
+ nouveau_abi16_fini(cli->abi16);
+ mutex_unlock(&cli->mutex);
+ nouveau_cli_fini(cli);
+ kfree(cli);
+ }
+ mutex_unlock(&drm->clients_lock);
+
nouveau_cli_fini(&drm->client);
nouveau_cli_fini(&drm->master);
nvif_parent_dtor(&drm->parent);
@@ -1111,6 +1130,16 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
{
struct nouveau_cli *cli = nouveau_cli(fpriv);
struct nouveau_drm *drm = nouveau_drm(dev);
+ int dev_index;
+
+ /*
+ * The device is gone, and as it currently stands all clients are
+ * cleaned up in the removal codepath. In the future this may change
+ * so that we can support hot-unplugging, but for now we immediately
+ * return to avoid a double-free situation.
+ */
+ if (!drm_dev_enter(dev, &dev_index))
+ return;
pm_runtime_get_sync(dev->dev);
@@ -1127,6 +1156,7 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
kfree(cli);
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
+ drm_dev_exit(dev_index);
}
static const struct drm_ioctl_desc
--
2.28.0
Rather than protecting the nouveau_drm clients list with the lock within
the "client" nouveau_cli, add a dedicated lock to serialize access to
the list. This is both clearer and necessary to avoid lockdep being
upset with us when we need to iterate through all the clients in the
list and potentially lock their mutex, which is the same class as the
lock protecting the entire list.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeremy Cline <jcline(a)redhat.com>
---
Changes since v1:
- Add a mutex_destroy() call when destroying the device struct
drivers/gpu/drm/nouveau/nouveau_drm.c | 10 ++++++----
drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 4fe4d664c5f2..6ee1adc9bd40 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -557,6 +557,7 @@ nouveau_drm_device_init(struct drm_device *dev)
nvkm_dbgopt(nouveau_debug, "DRM");
INIT_LIST_HEAD(&drm->clients);
+ mutex_init(&drm->clients_lock);
spin_lock_init(&drm->tile.lock);
/* workaround an odd issue on nvc1 by disabling the device's
@@ -654,6 +655,7 @@ nouveau_drm_device_fini(struct drm_device *dev)
nouveau_cli_fini(&drm->client);
nouveau_cli_fini(&drm->master);
nvif_parent_dtor(&drm->parent);
+ mutex_destroy(&drm->clients_lock);
kfree(drm);
}
@@ -1089,9 +1091,9 @@ nouveau_drm_open(struct drm_device *dev, struct drm_file *fpriv)
fpriv->driver_priv = cli;
- mutex_lock(&drm->client.mutex);
+ mutex_lock(&drm->clients_lock);
list_add(&cli->head, &drm->clients);
- mutex_unlock(&drm->client.mutex);
+ mutex_unlock(&drm->clients_lock);
done:
if (ret && cli) {
@@ -1117,9 +1119,9 @@ nouveau_drm_postclose(struct drm_device *dev, struct drm_file *fpriv)
nouveau_abi16_fini(cli->abi16);
mutex_unlock(&cli->mutex);
- mutex_lock(&drm->client.mutex);
+ mutex_lock(&drm->clients_lock);
list_del(&cli->head);
- mutex_unlock(&drm->client.mutex);
+ mutex_unlock(&drm->clients_lock);
nouveau_cli_fini(cli);
kfree(cli);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 9d04d1b36434..550e5f335146 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -141,6 +141,11 @@ struct nouveau_drm {
struct list_head clients;
+ /**
+ * @clients_lock: Protects access to the @clients list of &struct nouveau_cli.
+ */
+ struct mutex clients_lock;
+
u8 old_pm_cap;
struct {
--
2.28.0
Nouveau does not currently support hot-unplugging, but it still makes
sense to switch from drm_dev_unregister() to drm_dev_unplug().
drm_dev_unplug() calls drm_dev_unregister() after marking the device as
unplugged, but only after any device critical sections are finished.
Since nouveau isn't using drm_dev_enter() and drm_dev_exit(), there are
no critical sections so this is nearly functionally equivalent. However,
the DRM layer does check to see if the device is unplugged, and if it is
returns appropriate error codes.
In the future nouveau can add critical sections in order to truly
support hot-unplugging.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeremy Cline <jcline(a)redhat.com>
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index d141a5f004af..4fe4d664c5f2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -792,7 +792,7 @@ nouveau_drm_device_remove(struct drm_device *dev)
struct nvkm_client *client;
struct nvkm_device *device;
- drm_dev_unregister(dev);
+ drm_dev_unplug(dev);
dev->irq_enabled = false;
client = nvxx_client(&drm->client.base);
--
2.28.0
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 33fc379df76b4991e5ae312f07bcd6820811971e
Gitweb: https://git.kernel.org/tip/33fc379df76b4991e5ae312f07bcd6820811971e
Author: Anand K Mistry <amistry(a)google.com>
AuthorDate: Tue, 10 Nov 2020 12:33:53 +11:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Wed, 25 Nov 2020 20:17:09 +01:00
x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
When spectre_v2_user={seccomp,prctl},ibpb is specified on the command
line, IBPB is force-enabled and STIPB is conditionally-enabled (or not
available).
However, since
21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP}
instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour.
Because the issuing of IBPB relies on the switch_mm_*_ibpb static
branches, the mitigations behave as expected.
Since
1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
this discrepency caused the misreporting of IB speculation via prctl().
On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb,
prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL |
PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are
always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB
speculation mode, even though the flag is ignored.
Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should
also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not
available.
[ bp: Massage commit message. ]
Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
Signed-off-by: Anand K Mistry <amistry(a)google.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a…
---
arch/x86/kernel/cpu/bugs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 581fb72..d41b70f 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -739,11 +739,13 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
+ spectre_v2_user_ibpb = mode;
switch (cmd) {
case SPECTRE_V2_USER_CMD_FORCE:
case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
static_branch_enable(&switch_mm_always_ibpb);
+ spectre_v2_user_ibpb = SPECTRE_V2_USER_STRICT;
break;
case SPECTRE_V2_USER_CMD_PRCTL:
case SPECTRE_V2_USER_CMD_AUTO:
@@ -757,8 +759,6 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n",
static_key_enabled(&switch_mm_always_ibpb) ?
"always-on" : "conditional");
-
- spectre_v2_user_ibpb = mode;
}
/*
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Note the check in the unregister is not needed in the event that
current->mm is empty. This means the tear down is happening due to a
SigKill or OOM Killer, something along those lines. If current->mm has a
value then it must be checked and only the task that did the register
can do the unregister.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it
was explicit.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will
be addressed in a follow on patch.
Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Jann Horn <jannh(a)google.com>
Reported-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
---
Changes since v0:
----------------
Removed the checking of the pid and limitation that
whatever task opens the dev is the only one that can do write() or
ioctl(). While this limitation is OK it doesn't appear to be strictly
necessary.
Rebased on top of 5.10-rc1. Testing has been done on 5.9 due to a bug in
5.10 that is being worked (separate issue).
Changes since v1:
----------------
Remove explicit mmget/put to rely on the notifier register's mmgrab
instead.
Fixed missing check in rb_unregister to only check current->mm if its
actually valid.
Moved mm_from_tid_node to exp_rcv header and use it
Changes since v2:
----------------
Change Reported-by to Suggested-by for Jann
Commit msg updates
Remove private mm pointer and use notifier's
Changes since v3:
-----------------
Added Ira's RB and Cc stable list
Updated commit message
Added comment to mmu_rb_unregister
Renamed confusing variable in mmu_rb_register
---
drivers/infiniband/hw/hfi1/file_ops.c | 4 --
drivers/infiniband/hw/hfi1/hfi.h | 2 -
drivers/infiniband/hw/hfi1/mmu_rb.c | 76 ++++++++++++++++-------------
drivers/infiniband/hw/hfi1/mmu_rb.h | 16 ++++++
drivers/infiniband/hw/hfi1/user_exp_rcv.c | 12 +++--
drivers/infiniband/hw/hfi1/user_exp_rcv.h | 6 ++
drivers/infiniband/hw/hfi1/user_sdma.c | 13 +++--
drivers/infiniband/hw/hfi1/user_sdma.h | 7 ++-
8 files changed, 87 insertions(+), 49 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 8ca51e4..329ee4f 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -206,8 +207,6 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
spin_lock_init(&fd->tid_lock);
spin_lock_init(&fd->invalid_lock);
fd->rec_cpu_num = -1; /* no cpu affinity by default */
- fd->mm = current->mm;
- mmgrab(fd->mm);
fd->dd = dd;
fp->private_data = fd;
return 0;
@@ -711,7 +710,6 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
deallocate_ctxt(uctxt);
done:
- mmdrop(fdata->mm);
if (atomic_dec_and_test(&dd->user_refcount))
complete(&dd->user_comp);
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index b4c6bff..e09e824 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_KERNEL_H
#define _HFI1_KERNEL_H
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -1451,7 +1452,6 @@ struct hfi1_filedata {
u32 invalid_tid_idx;
/* protect invalid_tids array and invalid_tid_idx */
spinlock_t invalid_lock;
- struct mm_struct *mm;
};
extern struct xarray hfi1_dev_table;
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
index 24ca17b..06a422c 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -48,23 +49,11 @@
#include <linux/rculist.h>
#include <linux/mmu_notifier.h>
#include <linux/interval_tree_generic.h>
+#include <linux/sched/mm.h>
#include "mmu_rb.h"
#include "trace.h"
-struct mmu_rb_handler {
- struct mmu_notifier mn;
- struct rb_root_cached root;
- void *ops_arg;
- spinlock_t lock; /* protect the RB tree */
- struct mmu_rb_ops *ops;
- struct mm_struct *mm;
- struct list_head lru_list;
- struct work_struct del_work;
- struct list_head del_list;
- struct workqueue_struct *wq;
-};
-
static unsigned long mmu_node_start(struct mmu_rb_node *);
static unsigned long mmu_node_last(struct mmu_rb_node *);
static int mmu_notifier_range_start(struct mmu_notifier *,
@@ -92,37 +81,36 @@ static unsigned long mmu_node_last(struct mmu_rb_node *node)
return PAGE_ALIGN(node->addr + node->len) - 1;
}
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler)
{
- struct mmu_rb_handler *handlr;
+ struct mmu_rb_handler *h;
int ret;
- handlr = kmalloc(sizeof(*handlr), GFP_KERNEL);
- if (!handlr)
+ h = kmalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
return -ENOMEM;
- handlr->root = RB_ROOT_CACHED;
- handlr->ops = ops;
- handlr->ops_arg = ops_arg;
- INIT_HLIST_NODE(&handlr->mn.hlist);
- spin_lock_init(&handlr->lock);
- handlr->mn.ops = &mn_opts;
- handlr->mm = mm;
- INIT_WORK(&handlr->del_work, handle_remove);
- INIT_LIST_HEAD(&handlr->del_list);
- INIT_LIST_HEAD(&handlr->lru_list);
- handlr->wq = wq;
-
- ret = mmu_notifier_register(&handlr->mn, handlr->mm);
+ h->root = RB_ROOT_CACHED;
+ h->ops = ops;
+ h->ops_arg = ops_arg;
+ INIT_HLIST_NODE(&h->mn.hlist);
+ spin_lock_init(&h->lock);
+ h->mn.ops = &mn_opts;
+ INIT_WORK(&h->del_work, handle_remove);
+ INIT_LIST_HEAD(&h->del_list);
+ INIT_LIST_HEAD(&h->lru_list);
+ h->wq = wq;
+
+ ret = mmu_notifier_register(&h->mn, current->mm);
if (ret) {
- kfree(handlr);
+ kfree(h);
return ret;
}
- *handler = handlr;
+ *handler = h;
return 0;
}
@@ -133,8 +121,16 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
unsigned long flags;
struct list_head del_list;
+ /*
+ * do_exit() calls exit_mm() before exit_files() which would call close
+ * and end up in here. If there is no mm, then its a kernel thread and
+ * we need to let it continue the removal.
+ */
+ if (current->mm && (handler->mn.mm != current->mm))
+ return;
+
/* Unregister first so we don't get any more notifications. */
- mmu_notifier_unregister(&handler->mn, handler->mm);
+ mmu_notifier_unregister(&handler->mn, handler->mn.mm);
/*
* Make sure the wq delete handler is finished running. It will not
@@ -166,6 +162,10 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
int ret = 0;
trace_hfi1_mmu_rb_insert(mnode->addr, mnode->len);
+
+ if (current->mm != handler->mn.mm)
+ return -EPERM;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, mnode->addr, mnode->len);
if (node) {
@@ -180,6 +180,7 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler,
__mmu_int_rb_remove(mnode, &handler->root);
list_del(&mnode->list); /* remove from LRU list */
}
+ mnode->handler = handler;
unlock:
spin_unlock_irqrestore(&handler->lock, flags);
return ret;
@@ -217,6 +218,9 @@ bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler,
unsigned long flags;
bool ret = false;
+ if (current->mm != handler->mn.mm)
+ return ret;
+
spin_lock_irqsave(&handler->lock, flags);
node = __mmu_rb_search(handler, addr, len);
if (node) {
@@ -239,6 +243,9 @@ void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg)
unsigned long flags;
bool stop = false;
+ if (current->mm != handler->mn.mm)
+ return;
+
INIT_LIST_HEAD(&del_list);
spin_lock_irqsave(&handler->lock, flags);
@@ -272,6 +279,9 @@ void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler,
{
unsigned long flags;
+ if (current->mm != handler->mn.mm)
+ return;
+
/* Validity of handler and node pointers has been checked by caller. */
trace_hfi1_mmu_rb_remove(node->addr, node->len);
spin_lock_irqsave(&handler->lock, flags);
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.h b/drivers/infiniband/hw/hfi1/mmu_rb.h
index f04cec1..423aacc 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.h
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.h
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2016 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -54,6 +55,7 @@ struct mmu_rb_node {
unsigned long len;
unsigned long __last;
struct rb_node node;
+ struct mmu_rb_handler *handler;
struct list_head list;
};
@@ -71,7 +73,19 @@ struct mmu_rb_ops {
void *evict_arg, bool *stop);
};
-int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
+struct mmu_rb_handler {
+ struct mmu_notifier mn;
+ struct rb_root_cached root;
+ void *ops_arg;
+ spinlock_t lock; /* protect the RB tree */
+ struct mmu_rb_ops *ops;
+ struct list_head lru_list;
+ struct work_struct del_work;
+ struct list_head del_list;
+ struct workqueue_struct *wq;
+};
+
+int hfi1_mmu_rb_register(void *ops_arg,
struct mmu_rb_ops *ops,
struct workqueue_struct *wq,
struct mmu_rb_handler **handler);
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index f81ca20..b94fc7f 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 Cornelis Networks, Inc.
* Copyright(c) 2015-2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -173,15 +174,18 @@ static void unpin_rcv_pages(struct hfi1_filedata *fd,
{
struct page **pages;
struct hfi1_devdata *dd = fd->uctxt->dd;
+ struct mm_struct *mm;
if (mapped) {
pci_unmap_single(dd->pcidev, node->dma_addr,
node->npages * PAGE_SIZE, PCI_DMA_FROMDEVICE);
pages = &node->pages[idx];
+ mm = mm_from_tid_node(node);
} else {
pages = &tidbuf->pages[idx];
+ mm = current->mm;
}
- hfi1_release_user_pages(fd->mm, pages, npages, mapped);
+ hfi1_release_user_pages(mm, pages, npages, mapped);
fd->tid_n_pinned -= npages;
}
@@ -216,12 +220,12 @@ static int pin_rcv_pages(struct hfi1_filedata *fd, struct tid_user_buf *tidbuf)
* pages, accept the amount pinned so far and program only that.
* User space knows how to deal with partially programmed buffers.
*/
- if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) {
+ if (!hfi1_can_pin_pages(dd, current->mm, fd->tid_n_pinned, npages)) {
kfree(pages);
return -ENOMEM;
}
- pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages);
+ pinned = hfi1_acquire_user_pages(current->mm, vaddr, npages, true, pages);
if (pinned <= 0) {
kfree(pages);
return pinned;
@@ -756,7 +760,7 @@ static int set_rcvarray_entry(struct hfi1_filedata *fd,
if (fd->use_mn) {
ret = mmu_interval_notifier_insert(
- &node->notifier, fd->mm,
+ &node->notifier, current->mm,
tbuf->vaddr + (pageidx * PAGE_SIZE), npages * PAGE_SIZE,
&tid_mn_ops);
if (ret)
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.h b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
index 332abb4..d45c7b6 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.h
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_EXP_RCV_H
#define _HFI1_USER_EXP_RCV_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -95,4 +96,9 @@ int hfi1_user_exp_rcv_clear(struct hfi1_filedata *fd,
int hfi1_user_exp_rcv_invalid(struct hfi1_filedata *fd,
struct hfi1_tid_info *tinfo);
+static inline struct mm_struct *mm_from_tid_node(struct tid_rb_node *node)
+{
+ return node->notifier.mm;
+}
+
#endif /* _HFI1_USER_EXP_RCV_H */
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index a92346e..4a4956f9 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -1,4 +1,5 @@
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -188,7 +189,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
- pq->mm = fd->mm;
iowait_init(&pq->busy, 0, NULL, NULL, defer_packet_queue,
activate_packet_queue, NULL, NULL);
@@ -230,7 +230,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
cq->nentries = hfi1_sdma_comp_ring_size;
- ret = hfi1_mmu_rb_register(pq, pq->mm, &sdma_rb_ops, dd->pport->hfi1_wq,
+ ret = hfi1_mmu_rb_register(pq, &sdma_rb_ops, dd->pport->hfi1_wq,
&pq->handler);
if (ret) {
dd_dev_err(dd, "Failed to register with MMU %d", ret);
@@ -980,13 +980,13 @@ static int pin_sdma_pages(struct user_sdma_request *req,
npages -= node->npages;
retry:
- if (!hfi1_can_pin_pages(pq->dd, pq->mm,
+ if (!hfi1_can_pin_pages(pq->dd, current->mm,
atomic_read(&pq->n_locked), npages)) {
cleared = sdma_cache_evict(pq, npages);
if (cleared >= npages)
goto retry;
}
- pinned = hfi1_acquire_user_pages(pq->mm,
+ pinned = hfi1_acquire_user_pages(current->mm,
((unsigned long)iovec->iov.iov_base +
(node->npages * PAGE_SIZE)), npages, 0,
pages + node->npages);
@@ -995,7 +995,7 @@ static int pin_sdma_pages(struct user_sdma_request *req,
return pinned;
}
if (pinned != npages) {
- unpin_vector_pages(pq->mm, pages, node->npages, pinned);
+ unpin_vector_pages(current->mm, pages, node->npages, pinned);
return -EFAULT;
}
kfree(node->pages);
@@ -1008,7 +1008,8 @@ static int pin_sdma_pages(struct user_sdma_request *req,
static void unpin_sdma_pages(struct sdma_mmu_node *node)
{
if (node->npages) {
- unpin_vector_pages(node->pq->mm, node->pages, 0, node->npages);
+ unpin_vector_pages(mm_from_sdma_node(node), node->pages, 0,
+ node->npages);
atomic_sub(node->npages, &node->pq->n_locked);
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 9972e0e..1e8c02f 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -1,6 +1,7 @@
#ifndef _HFI1_USER_SDMA_H
#define _HFI1_USER_SDMA_H
/*
+ * Copyright(c) 2020 - Cornelis Networks, Inc.
* Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
@@ -133,7 +134,6 @@ struct hfi1_user_sdma_pkt_q {
unsigned long unpinned;
struct mmu_rb_handler *handler;
atomic_t n_locked;
- struct mm_struct *mm;
};
struct hfi1_user_sdma_comp_q {
@@ -250,4 +250,9 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
struct iovec *iovec, unsigned long dim,
unsigned long *count);
+static inline struct mm_struct *mm_from_sdma_node(struct sdma_mmu_node *node)
+{
+ return node->rb.handler->mn.mm;
+}
+
#endif /* _HFI1_USER_SDMA_H */
From: Frank Yang <puilp0502(a)gmail.com>
[ Upstream commit 652f3d00de523a17b0cebe7b90debccf13aa8c31 ]
The Varmilo VA104M Keyboard (04b4:07b1, reported as Varmilo Z104M)
exposes media control hotkeys as a USB HID consumer control device, but
these keys do not work in the current (5.8-rc1) kernel due to the
incorrect HID report descriptor. Fix the problem by modifying the
internal HID report descriptor.
More specifically, the keyboard report descriptor specifies the
logical boundary as 572~10754 (0x023c ~ 0x2a02) while the usage
boundary is specified as 0~10754 (0x00 ~ 0x2a02). This results in an
incorrect interpretation of input reports, causing inputs to be ignored.
By setting the Logical Minimum to zero, we align the logical boundary
with the Usage ID boundary.
Some notes:
* There seem to be multiple variants of the VA104M keyboard. This
patch specifically targets 04b4:07b1 variant.
* The device works out-of-the-box on Windows platform with the generic
consumer control device driver (hidserv.inf). This suggests that
Windows either ignores the Logical Minimum/Logical Maximum or
interprets the Usage ID assignment differently from the linux
implementation; Maybe there are other devices out there that only
works on Windows due to this problem?
Signed-off-by: Frank Yang <puilp0502(a)gmail.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/hid-cypress.c | 44 ++++++++++++++++++++++++++++++++++-----
drivers/hid/hid-ids.h | 2 ++
2 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/drivers/hid/hid-cypress.c b/drivers/hid/hid-cypress.c
index 1689568b597d4..12c5d7c96527a 100644
--- a/drivers/hid/hid-cypress.c
+++ b/drivers/hid/hid-cypress.c
@@ -26,19 +26,17 @@
#define CP_2WHEEL_MOUSE_HACK 0x02
#define CP_2WHEEL_MOUSE_HACK_ON 0x04
+#define VA_INVAL_LOGICAL_BOUNDARY 0x08
+
/*
* Some USB barcode readers from cypress have usage min and usage max in
* the wrong order
*/
-static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
+static __u8 *cp_rdesc_fixup(struct hid_device *hdev, __u8 *rdesc,
unsigned int *rsize)
{
- unsigned long quirks = (unsigned long)hid_get_drvdata(hdev);
unsigned int i;
- if (!(quirks & CP_RDESC_SWAPPED_MIN_MAX))
- return rdesc;
-
if (*rsize < 4)
return rdesc;
@@ -51,6 +49,40 @@ static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
return rdesc;
}
+static __u8 *va_logical_boundary_fixup(struct hid_device *hdev, __u8 *rdesc,
+ unsigned int *rsize)
+{
+ /*
+ * Varmilo VA104M (with VID Cypress and device ID 07B1) incorrectly
+ * reports Logical Minimum of its Consumer Control device as 572
+ * (0x02 0x3c). Fix this by setting its Logical Minimum to zero.
+ */
+ if (*rsize == 25 &&
+ rdesc[0] == 0x05 && rdesc[1] == 0x0c &&
+ rdesc[2] == 0x09 && rdesc[3] == 0x01 &&
+ rdesc[6] == 0x19 && rdesc[7] == 0x00 &&
+ rdesc[11] == 0x16 && rdesc[12] == 0x3c && rdesc[13] == 0x02) {
+ hid_info(hdev,
+ "fixing up varmilo VA104M consumer control report descriptor\n");
+ rdesc[12] = 0x00;
+ rdesc[13] = 0x00;
+ }
+ return rdesc;
+}
+
+static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
+ unsigned int *rsize)
+{
+ unsigned long quirks = (unsigned long)hid_get_drvdata(hdev);
+
+ if (quirks & CP_RDESC_SWAPPED_MIN_MAX)
+ rdesc = cp_rdesc_fixup(hdev, rdesc, rsize);
+ if (quirks & VA_INVAL_LOGICAL_BOUNDARY)
+ rdesc = va_logical_boundary_fixup(hdev, rdesc, rsize);
+
+ return rdesc;
+}
+
static int cp_input_mapped(struct hid_device *hdev, struct hid_input *hi,
struct hid_field *field, struct hid_usage *usage,
unsigned long **bit, int *max)
@@ -131,6 +163,8 @@ static const struct hid_device_id cp_devices[] = {
.driver_data = CP_RDESC_SWAPPED_MIN_MAX },
{ HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_MOUSE),
.driver_data = CP_2WHEEL_MOUSE_HACK },
+ { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1),
+ .driver_data = VA_INVAL_LOGICAL_BOUNDARY },
{ }
};
MODULE_DEVICE_TABLE(hid, cp_devices);
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 33d2b5948d7fc..773452c6edfab 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -279,6 +279,8 @@
#define USB_DEVICE_ID_CYPRESS_BARCODE_4 0xed81
#define USB_DEVICE_ID_CYPRESS_TRUETOUCH 0xc001
+#define USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1 0X07b1
+
#define USB_VENDOR_ID_DATA_MODUL 0x7374
#define USB_VENDOR_ID_DATA_MODUL_EASYMAXTOUCH 0x1201
--
2.27.0
From: Frank Yang <puilp0502(a)gmail.com>
[ Upstream commit 652f3d00de523a17b0cebe7b90debccf13aa8c31 ]
The Varmilo VA104M Keyboard (04b4:07b1, reported as Varmilo Z104M)
exposes media control hotkeys as a USB HID consumer control device, but
these keys do not work in the current (5.8-rc1) kernel due to the
incorrect HID report descriptor. Fix the problem by modifying the
internal HID report descriptor.
More specifically, the keyboard report descriptor specifies the
logical boundary as 572~10754 (0x023c ~ 0x2a02) while the usage
boundary is specified as 0~10754 (0x00 ~ 0x2a02). This results in an
incorrect interpretation of input reports, causing inputs to be ignored.
By setting the Logical Minimum to zero, we align the logical boundary
with the Usage ID boundary.
Some notes:
* There seem to be multiple variants of the VA104M keyboard. This
patch specifically targets 04b4:07b1 variant.
* The device works out-of-the-box on Windows platform with the generic
consumer control device driver (hidserv.inf). This suggests that
Windows either ignores the Logical Minimum/Logical Maximum or
interprets the Usage ID assignment differently from the linux
implementation; Maybe there are other devices out there that only
works on Windows due to this problem?
Signed-off-by: Frank Yang <puilp0502(a)gmail.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/hid-cypress.c | 44 ++++++++++++++++++++++++++++++++++-----
drivers/hid/hid-ids.h | 2 ++
2 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/drivers/hid/hid-cypress.c b/drivers/hid/hid-cypress.c
index 1689568b597d4..12c5d7c96527a 100644
--- a/drivers/hid/hid-cypress.c
+++ b/drivers/hid/hid-cypress.c
@@ -26,19 +26,17 @@
#define CP_2WHEEL_MOUSE_HACK 0x02
#define CP_2WHEEL_MOUSE_HACK_ON 0x04
+#define VA_INVAL_LOGICAL_BOUNDARY 0x08
+
/*
* Some USB barcode readers from cypress have usage min and usage max in
* the wrong order
*/
-static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
+static __u8 *cp_rdesc_fixup(struct hid_device *hdev, __u8 *rdesc,
unsigned int *rsize)
{
- unsigned long quirks = (unsigned long)hid_get_drvdata(hdev);
unsigned int i;
- if (!(quirks & CP_RDESC_SWAPPED_MIN_MAX))
- return rdesc;
-
if (*rsize < 4)
return rdesc;
@@ -51,6 +49,40 @@ static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
return rdesc;
}
+static __u8 *va_logical_boundary_fixup(struct hid_device *hdev, __u8 *rdesc,
+ unsigned int *rsize)
+{
+ /*
+ * Varmilo VA104M (with VID Cypress and device ID 07B1) incorrectly
+ * reports Logical Minimum of its Consumer Control device as 572
+ * (0x02 0x3c). Fix this by setting its Logical Minimum to zero.
+ */
+ if (*rsize == 25 &&
+ rdesc[0] == 0x05 && rdesc[1] == 0x0c &&
+ rdesc[2] == 0x09 && rdesc[3] == 0x01 &&
+ rdesc[6] == 0x19 && rdesc[7] == 0x00 &&
+ rdesc[11] == 0x16 && rdesc[12] == 0x3c && rdesc[13] == 0x02) {
+ hid_info(hdev,
+ "fixing up varmilo VA104M consumer control report descriptor\n");
+ rdesc[12] = 0x00;
+ rdesc[13] = 0x00;
+ }
+ return rdesc;
+}
+
+static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
+ unsigned int *rsize)
+{
+ unsigned long quirks = (unsigned long)hid_get_drvdata(hdev);
+
+ if (quirks & CP_RDESC_SWAPPED_MIN_MAX)
+ rdesc = cp_rdesc_fixup(hdev, rdesc, rsize);
+ if (quirks & VA_INVAL_LOGICAL_BOUNDARY)
+ rdesc = va_logical_boundary_fixup(hdev, rdesc, rsize);
+
+ return rdesc;
+}
+
static int cp_input_mapped(struct hid_device *hdev, struct hid_input *hi,
struct hid_field *field, struct hid_usage *usage,
unsigned long **bit, int *max)
@@ -131,6 +163,8 @@ static const struct hid_device_id cp_devices[] = {
.driver_data = CP_RDESC_SWAPPED_MIN_MAX },
{ HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_MOUSE),
.driver_data = CP_2WHEEL_MOUSE_HACK },
+ { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1),
+ .driver_data = VA_INVAL_LOGICAL_BOUNDARY },
{ }
};
MODULE_DEVICE_TABLE(hid, cp_devices);
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 4630b58634d87..c4a53fc648e95 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -307,6 +307,8 @@
#define USB_DEVICE_ID_CYPRESS_BARCODE_4 0xed81
#define USB_DEVICE_ID_CYPRESS_TRUETOUCH 0xc001
+#define USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1 0X07b1
+
#define USB_VENDOR_ID_DATA_MODUL 0x7374
#define USB_VENDOR_ID_DATA_MODUL_EASYMAXTOUCH 0x1201
--
2.27.0
From: Frank Yang <puilp0502(a)gmail.com>
[ Upstream commit 652f3d00de523a17b0cebe7b90debccf13aa8c31 ]
The Varmilo VA104M Keyboard (04b4:07b1, reported as Varmilo Z104M)
exposes media control hotkeys as a USB HID consumer control device, but
these keys do not work in the current (5.8-rc1) kernel due to the
incorrect HID report descriptor. Fix the problem by modifying the
internal HID report descriptor.
More specifically, the keyboard report descriptor specifies the
logical boundary as 572~10754 (0x023c ~ 0x2a02) while the usage
boundary is specified as 0~10754 (0x00 ~ 0x2a02). This results in an
incorrect interpretation of input reports, causing inputs to be ignored.
By setting the Logical Minimum to zero, we align the logical boundary
with the Usage ID boundary.
Some notes:
* There seem to be multiple variants of the VA104M keyboard. This
patch specifically targets 04b4:07b1 variant.
* The device works out-of-the-box on Windows platform with the generic
consumer control device driver (hidserv.inf). This suggests that
Windows either ignores the Logical Minimum/Logical Maximum or
interprets the Usage ID assignment differently from the linux
implementation; Maybe there are other devices out there that only
works on Windows due to this problem?
Signed-off-by: Frank Yang <puilp0502(a)gmail.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/hid-cypress.c | 44 ++++++++++++++++++++++++++++++++++-----
drivers/hid/hid-ids.h | 2 ++
2 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/drivers/hid/hid-cypress.c b/drivers/hid/hid-cypress.c
index 1689568b597d4..12c5d7c96527a 100644
--- a/drivers/hid/hid-cypress.c
+++ b/drivers/hid/hid-cypress.c
@@ -26,19 +26,17 @@
#define CP_2WHEEL_MOUSE_HACK 0x02
#define CP_2WHEEL_MOUSE_HACK_ON 0x04
+#define VA_INVAL_LOGICAL_BOUNDARY 0x08
+
/*
* Some USB barcode readers from cypress have usage min and usage max in
* the wrong order
*/
-static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
+static __u8 *cp_rdesc_fixup(struct hid_device *hdev, __u8 *rdesc,
unsigned int *rsize)
{
- unsigned long quirks = (unsigned long)hid_get_drvdata(hdev);
unsigned int i;
- if (!(quirks & CP_RDESC_SWAPPED_MIN_MAX))
- return rdesc;
-
if (*rsize < 4)
return rdesc;
@@ -51,6 +49,40 @@ static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
return rdesc;
}
+static __u8 *va_logical_boundary_fixup(struct hid_device *hdev, __u8 *rdesc,
+ unsigned int *rsize)
+{
+ /*
+ * Varmilo VA104M (with VID Cypress and device ID 07B1) incorrectly
+ * reports Logical Minimum of its Consumer Control device as 572
+ * (0x02 0x3c). Fix this by setting its Logical Minimum to zero.
+ */
+ if (*rsize == 25 &&
+ rdesc[0] == 0x05 && rdesc[1] == 0x0c &&
+ rdesc[2] == 0x09 && rdesc[3] == 0x01 &&
+ rdesc[6] == 0x19 && rdesc[7] == 0x00 &&
+ rdesc[11] == 0x16 && rdesc[12] == 0x3c && rdesc[13] == 0x02) {
+ hid_info(hdev,
+ "fixing up varmilo VA104M consumer control report descriptor\n");
+ rdesc[12] = 0x00;
+ rdesc[13] = 0x00;
+ }
+ return rdesc;
+}
+
+static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
+ unsigned int *rsize)
+{
+ unsigned long quirks = (unsigned long)hid_get_drvdata(hdev);
+
+ if (quirks & CP_RDESC_SWAPPED_MIN_MAX)
+ rdesc = cp_rdesc_fixup(hdev, rdesc, rsize);
+ if (quirks & VA_INVAL_LOGICAL_BOUNDARY)
+ rdesc = va_logical_boundary_fixup(hdev, rdesc, rsize);
+
+ return rdesc;
+}
+
static int cp_input_mapped(struct hid_device *hdev, struct hid_input *hi,
struct hid_field *field, struct hid_usage *usage,
unsigned long **bit, int *max)
@@ -131,6 +163,8 @@ static const struct hid_device_id cp_devices[] = {
.driver_data = CP_RDESC_SWAPPED_MIN_MAX },
{ HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_MOUSE),
.driver_data = CP_2WHEEL_MOUSE_HACK },
+ { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1),
+ .driver_data = VA_INVAL_LOGICAL_BOUNDARY },
{ }
};
MODULE_DEVICE_TABLE(hid, cp_devices);
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index a1e5e0529545b..222204c2da22a 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -322,6 +322,8 @@
#define USB_DEVICE_ID_CYPRESS_BARCODE_4 0xed81
#define USB_DEVICE_ID_CYPRESS_TRUETOUCH 0xc001
+#define USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1 0X07b1
+
#define USB_VENDOR_ID_DATA_MODUL 0x7374
#define USB_VENDOR_ID_DATA_MODUL_EASYMAXTOUCH 0x1201
--
2.27.0
From: Frank Yang <puilp0502(a)gmail.com>
[ Upstream commit 652f3d00de523a17b0cebe7b90debccf13aa8c31 ]
The Varmilo VA104M Keyboard (04b4:07b1, reported as Varmilo Z104M)
exposes media control hotkeys as a USB HID consumer control device, but
these keys do not work in the current (5.8-rc1) kernel due to the
incorrect HID report descriptor. Fix the problem by modifying the
internal HID report descriptor.
More specifically, the keyboard report descriptor specifies the
logical boundary as 572~10754 (0x023c ~ 0x2a02) while the usage
boundary is specified as 0~10754 (0x00 ~ 0x2a02). This results in an
incorrect interpretation of input reports, causing inputs to be ignored.
By setting the Logical Minimum to zero, we align the logical boundary
with the Usage ID boundary.
Some notes:
* There seem to be multiple variants of the VA104M keyboard. This
patch specifically targets 04b4:07b1 variant.
* The device works out-of-the-box on Windows platform with the generic
consumer control device driver (hidserv.inf). This suggests that
Windows either ignores the Logical Minimum/Logical Maximum or
interprets the Usage ID assignment differently from the linux
implementation; Maybe there are other devices out there that only
works on Windows due to this problem?
Signed-off-by: Frank Yang <puilp0502(a)gmail.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/hid-cypress.c | 44 ++++++++++++++++++++++++++++++++++-----
drivers/hid/hid-ids.h | 2 ++
2 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/drivers/hid/hid-cypress.c b/drivers/hid/hid-cypress.c
index 1689568b597d4..12c5d7c96527a 100644
--- a/drivers/hid/hid-cypress.c
+++ b/drivers/hid/hid-cypress.c
@@ -26,19 +26,17 @@
#define CP_2WHEEL_MOUSE_HACK 0x02
#define CP_2WHEEL_MOUSE_HACK_ON 0x04
+#define VA_INVAL_LOGICAL_BOUNDARY 0x08
+
/*
* Some USB barcode readers from cypress have usage min and usage max in
* the wrong order
*/
-static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
+static __u8 *cp_rdesc_fixup(struct hid_device *hdev, __u8 *rdesc,
unsigned int *rsize)
{
- unsigned long quirks = (unsigned long)hid_get_drvdata(hdev);
unsigned int i;
- if (!(quirks & CP_RDESC_SWAPPED_MIN_MAX))
- return rdesc;
-
if (*rsize < 4)
return rdesc;
@@ -51,6 +49,40 @@ static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
return rdesc;
}
+static __u8 *va_logical_boundary_fixup(struct hid_device *hdev, __u8 *rdesc,
+ unsigned int *rsize)
+{
+ /*
+ * Varmilo VA104M (with VID Cypress and device ID 07B1) incorrectly
+ * reports Logical Minimum of its Consumer Control device as 572
+ * (0x02 0x3c). Fix this by setting its Logical Minimum to zero.
+ */
+ if (*rsize == 25 &&
+ rdesc[0] == 0x05 && rdesc[1] == 0x0c &&
+ rdesc[2] == 0x09 && rdesc[3] == 0x01 &&
+ rdesc[6] == 0x19 && rdesc[7] == 0x00 &&
+ rdesc[11] == 0x16 && rdesc[12] == 0x3c && rdesc[13] == 0x02) {
+ hid_info(hdev,
+ "fixing up varmilo VA104M consumer control report descriptor\n");
+ rdesc[12] = 0x00;
+ rdesc[13] = 0x00;
+ }
+ return rdesc;
+}
+
+static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc,
+ unsigned int *rsize)
+{
+ unsigned long quirks = (unsigned long)hid_get_drvdata(hdev);
+
+ if (quirks & CP_RDESC_SWAPPED_MIN_MAX)
+ rdesc = cp_rdesc_fixup(hdev, rdesc, rsize);
+ if (quirks & VA_INVAL_LOGICAL_BOUNDARY)
+ rdesc = va_logical_boundary_fixup(hdev, rdesc, rsize);
+
+ return rdesc;
+}
+
static int cp_input_mapped(struct hid_device *hdev, struct hid_input *hi,
struct hid_field *field, struct hid_usage *usage,
unsigned long **bit, int *max)
@@ -131,6 +163,8 @@ static const struct hid_device_id cp_devices[] = {
.driver_data = CP_RDESC_SWAPPED_MIN_MAX },
{ HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_MOUSE),
.driver_data = CP_2WHEEL_MOUSE_HACK },
+ { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1),
+ .driver_data = VA_INVAL_LOGICAL_BOUNDARY },
{ }
};
MODULE_DEVICE_TABLE(hid, cp_devices);
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index e18d796d985f8..e9155f7c8c51c 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -330,6 +330,8 @@
#define USB_DEVICE_ID_CYPRESS_BARCODE_4 0xed81
#define USB_DEVICE_ID_CYPRESS_TRUETOUCH 0xc001
+#define USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1 0X07b1
+
#define USB_VENDOR_ID_DATA_MODUL 0x7374
#define USB_VENDOR_ID_DATA_MODUL_EASYMAXTOUCH 0x1201
--
2.27.0
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: sunxi-cir: ensure IR is handled when it is continuous
Author: Sean Young <sean(a)mess.org>
Date: Mon Nov 9 23:16:52 2020 +0100
If a user holds a button down on a remote, then no ir idle interrupt will
be generated until the user releases the button, depending on how quickly
the remote repeats. No IR is processed until that point, which means that
holding down a button may not do anything.
This also resolves an issue on a Cubieboard 1 where the IR receiver is
picking up ambient infrared as IR and spews out endless
"rc rc0: IR event FIFO is full!" messages unless you choose to live in
the dark.
Cc: stable(a)vger.kernel.org
Tested-by: Hans Verkuil <hverkuil(a)xs4all.nl>
Acked-by: Maxime Ripard <mripard(a)kernel.org>
Reported-by: Hans Verkuil <hverkuil(a)xs4all.nl>
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/rc/sunxi-cir.c | 2 ++
1 file changed, 2 insertions(+)
---
diff --git a/drivers/media/rc/sunxi-cir.c b/drivers/media/rc/sunxi-cir.c
index ddee6ee37bab..4afc5895bee7 100644
--- a/drivers/media/rc/sunxi-cir.c
+++ b/drivers/media/rc/sunxi-cir.c
@@ -137,6 +137,8 @@ static irqreturn_t sunxi_ir_irq(int irqno, void *dev_id)
} else if (status & REG_RXSTA_RPE) {
ir_raw_event_set_idle(ir->rc, true);
ir_raw_event_handle(ir->rc);
+ } else {
+ ir_raw_event_handle(ir->rc);
}
spin_unlock(&ir->ir_lock);
On Wed, Nov 25, 2020 at 12:39:02PM +0000, Barnabás Pőcze wrote:
>2020. november 25., szerda 11:57 keltezéssel, Coiby Xu írta:
>
>> On Mon, Nov 23, 2020 at 04:32:40PM +0000, Barnabás Pőcze wrote:
>> >> [...]
>> >> >> >> +static int get_gpio_pin_state(struct irq_desc *irq_desc)
>> >> >> >> +{
>> >> >> >> + struct gpio_chip *gc = irq_data_get_irq_chip_data(&irq_desc->irq_data);
>> >> >> >> +
>> >> >> >> + return gc->get(gc, irq_desc->irq_data.hwirq);
>> >> >> >> +}
>> >> >> [...]
>> >> >> >> + ssize_t status = get_gpio_pin_state(irq_desc);
>> >> >> >
>> >> >> >`get_gpio_pin_state()` returns an `int`, so I am not sure why `ssize_t` is used here.
>> >> >> >
>> >> >>
>> >> >> I used `ssize_t` because I found gpiolib-sysfs.c uses `ssize_t`
>> >> >>
>> >> >> // drivers/gpio/gpiolib-sysfs.c
>> >> >> static ssize_t value_show(struct device *dev,
>> >> >> struct device_attribute *attr, char *buf)
>> >> >> {
>> >> >> struct gpiod_data *data = dev_get_drvdata(dev);
>> >> >> struct gpio_desc *desc = data->desc;
>> >> >> ssize_t status;
>> >> >>
>> >> >> mutex_lock(&data->mutex);
>> >> >>
>> >> >> status = gpiod_get_value_cansleep(desc);
>> >> >> ...
>> >> >> return status;
>> >> >> }
>> >> >>
>> >> >> According to the book Advanced Programming in the UNIX Environment by
>> >> >> W. Richard Stevens,
>> >> >> With the 1990 POSIX.1 standard, the primitive system data type
>> >> >> ssize_t was introduced to provide the signed return value...
>> >> >>
>> >> >> So ssize_t is fairly common, for example, the read and write syscall
>> >> >> return a value of type ssize_t. But I haven't found out why ssize_t is
>> >> >> better int.
>> >> >> >
>> >> >
>> >> >Sorry if I wasn't clear, what prompted me to ask that question is the following:
>> >> >`gc->get()` returns `int`, `get_gpio_pin_state()` returns `int`, yet you still
>> >> >save the return value of `get_gpio_pin_state()` into a variable with type
>> >> >`ssize_t` for no apparent reason. In the example you cited, `ssize_t` is used
>> >> >because the show() callback of a sysfs attribute must return `ssize_t`, but here,
>> >> >`interrupt_line_active()` returns `bool`, so I don't see any advantage over a
>> >> >plain `int`. Anyways, I believe either one is fine, I just found it odd.
>> >> >
>> >> I don't understand why "the show() callback of a sysfs attribute
>> >> must return `ssize_t`" instead of int. Do you think the rationale
>> >> behind it is the same for this case? If yes, using "ssize_t" for
>> >> status could be justified.
>> >> [...]
>> >
>> >Because it was decided that way, `ssize_t` is a better choice for that purpose
>> >than plain `int`. You can see it in include/linux/device.h, that both the
>> >show() and store() methods must return `ssize_t`.
>> >
>>
>> Could you explain why `ssize_t` is a better choice? AFAIU, ssize_t
>> is used because we can return negative value to indicate an error.
>
>ssize_t: "Signed integer type used for a count of bytes or an error indication."[1]
>
>And POSIX mandates that the return type of read() and write() be `ssize_t`,
>so it makes sense to keep a similar interface in the kernel since show() and store()
>are called as a direct result of the user using the read() and write() system
>calls, respectively.
>
>
>> If
>> we use ssize_t here, it's a reminder that reading a GPIO pin's status
>> could fail. And ssize_t reminds us it's a operation similar to read
>> or write. So ssize_t is better than int here. And maybe it's the same
>> reason why "it was decided that way".
>> [...]
>
>I believe it's more appropriate to use ssize_t when it's about a "count of elements",
>but the GPIO pin state is a single boolean value (or an error indication), which
>is returned as an `int`. Since it's returned as an `int` - I'm arguing that -
>there is no reason to use `ssize_t` here. Anyways, both `ssize_t` and `int` work fine
>in this case.
>
So value_show in gpiolib-sysfs.c is a kind of being forced to use
ssize_t. I'll use int instead to avoid confusion in v4. Thank you for
the explanation!
>
>[1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#t…
>
>
>Regards,
>Barnabás Pőcze
--
Best regards,
Coiby
The patch below does not apply to the 5.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 98d278ca00bd8f62c8bc98bd9e65372d16eb6956 Mon Sep 17 00:00:00 2001
From: Gabriel David <ultracoolguy(a)tutanota.com>
Date: Fri, 2 Oct 2020 18:27:00 -0400
Subject: [PATCH] leds: lm3697: Fix out-of-bound access
If both LED banks aren't used in device tree, an out-of-bounds
condition in lm3697_init occurs because of the for loop assuming that
all the banks are used. Fix it by adding a variable that contains the
number of used banks.
Signed-off-by: Gabriel David <ultracoolguy(a)tutanota.com>
[removed extra rename, minor tweaks]
Signed-off-by: Pavel Machek <pavel(a)ucw.cz>
Cc: stable(a)kernel.org
diff --git a/drivers/leds/leds-lm3697.c b/drivers/leds/leds-lm3697.c
index 64c0794801e6..7d216cdb91a8 100644
--- a/drivers/leds/leds-lm3697.c
+++ b/drivers/leds/leds-lm3697.c
@@ -78,6 +78,7 @@ struct lm3697 {
struct mutex lock;
int bank_cfg;
+ int num_banks;
struct lm3697_led leds[];
};
@@ -180,7 +181,7 @@ static int lm3697_init(struct lm3697 *priv)
if (ret)
dev_err(dev, "Cannot write OUTPUT config\n");
- for (i = 0; i < LM3697_MAX_CONTROL_BANKS; i++) {
+ for (i = 0; i < priv->num_banks; i++) {
led = &priv->leds[i];
ret = ti_lmu_common_set_ramp(&led->lmu_data);
if (ret)
@@ -301,8 +302,8 @@ static int lm3697_probe(struct i2c_client *client,
int ret;
count = device_get_child_node_count(dev);
- if (!count) {
- dev_err(dev, "LEDs are not defined in device tree!");
+ if (!count || count > LM3697_MAX_CONTROL_BANKS) {
+ dev_err(dev, "Strange device tree!");
return -ENODEV;
}
@@ -315,6 +316,7 @@ static int lm3697_probe(struct i2c_client *client,
led->client = client;
led->dev = dev;
+ led->num_banks = count;
led->regmap = devm_regmap_init_i2c(client, &lm3697_regmap_config);
if (IS_ERR(led->regmap)) {
ret = PTR_ERR(led->regmap);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bfe8cc1db02ab243c62780f17fc57f65bde0afe1 Mon Sep 17 00:00:00 2001
From: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Date: Sat, 21 Nov 2020 22:17:15 -0800
Subject: [PATCH] mm/userfaultfd: do not access vma->vm_mm after calling
handle_userfault()
Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.
In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
freed in some cases. In the case of userfaultfd_missing(), this will
happen after calling handle_userfault(), which might have released the
mmap_lock. Therefore, the following pte_free(vma->vm_mm, pgtable) will
access an unstable vma->vm_mm, which could have been freed or re-used
already.
For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the
passed-in mm. The implementation for SPARC32 would also access
mm->page_table_lock for pte_free(), but there is no THP support in
SPARC32, so the buggy code path will not be used there.
For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used
mm could result in various more or less subtle bugs due to list /
pagetable corruption.
Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.
Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case
it put the pte_free() before calling handle_userfault().
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334
CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
create_huge_pmd mm/memory.c:4256 [inline]
__handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
handle_mm_fault+0x288/0x748 mm/memory.c:4607
do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
_copy_from_user+0x48/0xa8 lib/usercopy.c:16
copy_from_user include/linux/uaccess.h:192 [inline]
__do_sys_sigaltstack kernel/signal.c:4064 [inline]
__s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Allocated by task 9334:
slab_alloc_node mm/slub.c:2891 [inline]
slab_alloc mm/slub.c:2899 [inline]
kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
__split_vma+0xba/0x560 mm/mmap.c:2742
split_vma+0xca/0x108 mm/mmap.c:2800
mlock_fixup+0x4ae/0x600 mm/mlock.c:550
apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
do_mlock+0x1aa/0x718 mm/mlock.c:711
__do_sys_mlock2 mm/mlock.c:738 [inline]
__s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Freed by task 9333:
slab_free mm/slub.c:3142 [inline]
kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
__vma_adjust+0x7b2/0x2508 mm/mmap.c:960
vma_merge+0x87e/0xce0 mm/mmap.c:1209
userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
__fput+0x22c/0x7a8 fs/file_table.c:281
task_work_run+0x200/0x320 kernel/task_work.c:151
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
system_call+0xe6/0x28c arch/s390/kernel/entry.S:416
The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200
The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10)
The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab)
raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
page dumped because: kasan: bad access detected
page->mem_cgroup:0000000096959501
Memory state around the buggy address:
00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Reported-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org> [4.3+]
Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.…
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9474dbc150ed..ec2bb93f7431 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -710,7 +710,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
transparent_hugepage_use_zero_page()) {
pgtable_t pgtable;
struct page *zero_page;
- bool set;
vm_fault_t ret;
pgtable = pte_alloc_one(vma->vm_mm);
if (unlikely(!pgtable))
@@ -723,25 +722,25 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
}
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
ret = 0;
- set = false;
if (pmd_none(*vmf->pmd)) {
ret = check_stable_address_space(vma->vm_mm);
if (ret) {
spin_unlock(vmf->ptl);
+ pte_free(vma->vm_mm, pgtable);
} else if (userfaultfd_missing(vma)) {
spin_unlock(vmf->ptl);
+ pte_free(vma->vm_mm, pgtable);
ret = handle_userfault(vmf, VM_UFFD_MISSING);
VM_BUG_ON(ret & VM_FAULT_FALLBACK);
} else {
set_huge_zero_page(pgtable, vma->vm_mm, vma,
haddr, vmf->pmd, zero_page);
spin_unlock(vmf->ptl);
- set = true;
}
- } else
+ } else {
spin_unlock(vmf->ptl);
- if (!set)
pte_free(vma->vm_mm, pgtable);
+ }
return ret;
}
gfp = alloc_hugepage_direct_gfpmask(vma);
Hei hei,
please apply e9a6882f267a8105461066e3ea6b4b6b9be1b807 ("perf event: Check
ref_reloc_sym before using it") to the v4.9 stable tree and maybe to later LTS
trees (v4.14, v4.19) as well.
That change is in mainline since v5.4-rc1 and it's a fix for a possible null
pointer dereference in the tool 'perf', so it is a fix for a user space tool
coming with the kernel tree actually.
I am directly affeted by this, calling 'perf record' on an at91 based device
(armv5te, sam9g20 soc) running PREEMPT RT kernel 4.9.220-rt143 results in a
segfault here. I debugged this with gdb and tracked it down to the exact
pointer being NULL, which is checked now in that above mentioned changeset.
I could cleanly apply that changeset to my local tree and the segfault is not
triggered anymore, I can use perf on that platform now.
I only tested this on the above mentioned v4.9 based kernel version.
Thanks & Greets
Alex
On Mon, Nov 23, 2020 at 04:32:40PM +0000, Barnabás Pőcze wrote:
>> [...]
>> >> >> +static int get_gpio_pin_state(struct irq_desc *irq_desc)
>> >> >> +{
>> >> >> + struct gpio_chip *gc = irq_data_get_irq_chip_data(&irq_desc->irq_data);
>> >> >> +
>> >> >> + return gc->get(gc, irq_desc->irq_data.hwirq);
>> >> >> +}
>> >> [...]
>> >> >> + ssize_t status = get_gpio_pin_state(irq_desc);
>> >> >
>> >> >`get_gpio_pin_state()` returns an `int`, so I am not sure why `ssize_t` is used here.
>> >> >
>> >>
>> >> I used `ssize_t` because I found gpiolib-sysfs.c uses `ssize_t`
>> >>
>> >> // drivers/gpio/gpiolib-sysfs.c
>> >> static ssize_t value_show(struct device *dev,
>> >> struct device_attribute *attr, char *buf)
>> >> {
>> >> struct gpiod_data *data = dev_get_drvdata(dev);
>> >> struct gpio_desc *desc = data->desc;
>> >> ssize_t status;
>> >>
>> >> mutex_lock(&data->mutex);
>> >>
>> >> status = gpiod_get_value_cansleep(desc);
>> >> ...
>> >> return status;
>> >> }
>> >>
>> >> According to the book Advanced Programming in the UNIX Environment by
>> >> W. Richard Stevens,
>> >> With the 1990 POSIX.1 standard, the primitive system data type
>> >> ssize_t was introduced to provide the signed return value...
>> >>
>> >> So ssize_t is fairly common, for example, the read and write syscall
>> >> return a value of type ssize_t. But I haven't found out why ssize_t is
>> >> better int.
>> >> >
>> >
>> >Sorry if I wasn't clear, what prompted me to ask that question is the following:
>> >`gc->get()` returns `int`, `get_gpio_pin_state()` returns `int`, yet you still
>> >save the return value of `get_gpio_pin_state()` into a variable with type
>> >`ssize_t` for no apparent reason. In the example you cited, `ssize_t` is used
>> >because the show() callback of a sysfs attribute must return `ssize_t`, but here,
>> >`interrupt_line_active()` returns `bool`, so I don't see any advantage over a
>> >plain `int`. Anyways, I believe either one is fine, I just found it odd.
>> >
>> I don't understand why "the show() callback of a sysfs attribute
>> must return `ssize_t`" instead of int. Do you think the rationale
>> behind it is the same for this case? If yes, using "ssize_t" for
>> status could be justified.
>> [...]
>
>Because it was decided that way, `ssize_t` is a better choice for that purpose
>than plain `int`. You can see it in include/linux/device.h, that both the
>show() and store() methods must return `ssize_t`.
>
Could you explain why `ssize_t` is a better choice? AFAIU, ssize_t
is used because we can return negative value to indicate an error. If
we use ssize_t here, it's a reminder that reading a GPIO pin's status
could fail. And ssize_t reminds us it's a operation similar to read
or write. So ssize_t is better than int here. And maybe it's the same
reason why "it was decided that way".
>What I'm arguing here, is that there is no reason to use `ssize_t` in this case.
>Because `get_gpio_pin_state()` returns `int`. So when you do
>```
>ssize_t status = get_gpio_pin_state(...);
>```
>then the return value of `get_gpio_pin_state()` (which is an `int`), will be
>converted to an `ssize_t`, and saved into `status`. I'm arguing that that is
>unnecessary and a plain `int` would work perfectly well in this case. Anyways,
>both work fine, I just found the unnecessary use of `ssize_t` here odd.
>
>
>Regards,
>Barnabás Pőcze
--
Best regards,
Coiby
In the error path of rpcif_manual_xfer() the value of ret is overwritten
by value returned by reset_control_reset() function and thus returning
incorrect value to the caller.
This patch makes sure the correct value is returned to the caller of
rpcif_manual_xfer() by dropping the overwrite of ret in error path.
Also now we ignore the value returned by reset_control_reset() in the
error path and instead print a error message when it fails.
Fixes: ca7d8b980b67f ("memory: add Renesas RPC-IF driver")
Reported-by: Pavel Machek <pavel(a)denx.de>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
Cc: stable(a)vger.kernel.org
---
drivers/memory/renesas-rpc-if.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/memory/renesas-rpc-if.c b/drivers/memory/renesas-rpc-if.c
index f2a33a1af836..69f2e2b4cd50 100644
--- a/drivers/memory/renesas-rpc-if.c
+++ b/drivers/memory/renesas-rpc-if.c
@@ -508,7 +508,8 @@ int rpcif_manual_xfer(struct rpcif *rpc)
return ret;
err_out:
- ret = reset_control_reset(rpc->rstc);
+ if (reset_control_reset(rpc->rstc))
+ dev_err(rpc->dev, "Failed to reset HW\n");
rpcif_hw_init(rpc, rpc->bus_size == 2);
goto exit;
}
--
2.17.1