I'm announcing the release of the 4.4.273 kernel.
All users of the 4.4 kernel series must upgrade.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/mips/lib/mips-atomic.c | 12 +-
arch/powerpc/boot/dts/fsl/p1010si-post.dtsi | 8 +
arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 16 +++
drivers/i2c/busses/i2c-mpc.c | 95 +++++++++++++++++++++-
drivers/isdn/hardware/mISDN/netjet.c | 1
drivers/net/appletalk/cops.c | 4
drivers/net/bonding/bond_main.c | 2
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 4
drivers/net/ethernet/cadence/macb.c | 3
drivers/net/ethernet/qlogic/qla3xxx.c | 2
drivers/net/phy/mdio_bus.c | 3
drivers/scsi/hosts.c | 2
drivers/scsi/qla2xxx/qla_target.c | 2
drivers/scsi/vmw_pvscsi.c | 8 +
drivers/usb/dwc3/ep0.c | 3
drivers/usb/gadget/function/f_eem.c | 4
drivers/usb/gadget/function/f_ncm.c | 2
drivers/usb/serial/ftdi_sio.c | 1
drivers/usb/serial/ftdi_sio_ids.h | 1
drivers/usb/serial/omninet.c | 2
drivers/usb/serial/quatech2.c | 6 -
fs/btrfs/file.c | 4
fs/nfs/client.c | 2
fs/nfs/nfs4proc.c | 8 +
fs/proc/base.c | 11 ++
include/linux/kvm_host.h | 11 ++
kernel/cgroup.c | 4
kernel/events/core.c | 2
kernel/trace/ftrace.c | 8 +
net/netlink/af_netlink.c | 6 -
net/nfc/rawsock.c | 2
sound/soc/codecs/sti-sas.c | 1
tools/perf/util/session.c | 1
34 files changed, 210 insertions(+), 33 deletions(-)
Alexander Kuznetsov (1):
cgroup1: don't allow '\n' in renaming
Alexandre GRIVEAUX (1):
USB: serial: omninet: add device id for Zyxel Omni 56K Plus
Chris Packham (4):
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
i2c: mpc: Make use of i2c_recover_bus()
i2c: mpc: implement erratum A-004447 workaround
Dai Ngo (1):
NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
Dan Carpenter (2):
net: mdiobus: get rid of a BUG_ON()
NFS: Fix a potential NULL dereference in nfs_get_client()
Dmitry Bogdanov (1):
scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
George McCollister (1):
USB: serial: ftdi_sio: add NovaTech OrionMX product ID
Greg Kroah-Hartman (1):
Linux 4.4.273
Jeimon (1):
net/nfc/rawsock.c: fix a permission check bug
Jiapeng Chong (1):
bnx2x: Fix missing error code in bnx2x_iov_init_one()
Johan Hovold (1):
USB: serial: quatech2: fix control-request directions
Johannes Berg (2):
bonding: init notify_work earlier to avoid uninitialized use
netlink: disable IRQs for netlink_lock_table()
Kees Cook (1):
proc: Track /proc/$pid/attr/ opener mm_struct
Leo Yan (1):
perf session: Correct buffer copying when peeking events
Linus Torvalds (1):
proc: only require mm_struct for writing
Linyu Yuan (1):
usb: gadget: eem: fix wrong eem header operation
Maciej Żenczykowski (1):
USB: f_ncm: ncm_bitrate (speed) is unsigned
Marco Elver (1):
perf: Fix data race between pin_count increment/decrement
Marian-Cristian Rotariu (1):
usb: dwc3: ep0: fix NULL pointer exception
Matt Wang (1):
scsi: vmw_pvscsi: Set correct residual data length
Ming Lei (1):
scsi: core: Only put parent device if host state differs from SHOST_CREATED
Paolo Bonzini (2):
kvm: avoid speculation-based attacks from out-of-range memslot accesses
kvm: fix previous commit for 32-bit builds
Ritesh Harjani (1):
btrfs: return value from btrfs_mark_extent_written() in case of error
Saubhik Mukherjee (1):
net: appletalk: cops: Fix data race in cops_probe1
Steven Rostedt (VMware) (1):
ftrace: Do not blindly read the ip address in ftrace_bug()
Tiezhu Yang (1):
MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
Zheyu Ma (2):
isdn: mISDN: netjet: Fix crash in nj_probe:
net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
Zong Li (1):
net: macb: ensure the device is available before accessing GEMGXL control registers
Zou Wei (1):
ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
On Wed, Jun 16, 2021 at 02:47:15PM +0800, Liu Shixin wrote:
> Hi, Suren,
>
> I read the previous discussion about fixing CVE-2020-29374 in stable 4.14 and 4.19 in
> <https://lore.kernel.org/linux-mm/20210401181741.168763-1-surenb@google.com/>
>
> https://lore.kernel.org/linux-mm/20210401181741.168763-1-surenb@google.com/
>
> And the results of the discussion is that you backports of 17839856fd58 for 4.14 and
>
> 4.19 kernels.
>
> But the bug about dax and strace in the discussion has not been solved, right? I don't
>
> find a conclusion on this issue, am I missing something? Does this problem still exist in
>
> the stable 4.14 and 4.19 kernel?
As the code is all there for you, can you just test them and see for
yourself?
thanks,
greg k-h
This is a note to let you know that I've just added the patch titled
usb: chipidea: imx: Fix Battery Charger 1.2 CDP detection
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From c6d580d96f140596d69220f60ce0cfbea4ee5c0f Mon Sep 17 00:00:00 2001
From: Breno Lima <breno.lima(a)nxp.com>
Date: Mon, 14 Jun 2021 13:50:13 -0400
Subject: usb: chipidea: imx: Fix Battery Charger 1.2 CDP detection
i.MX8MM cannot detect certain CDP USB HUBs. usbmisc_imx.c driver is not
following CDP timing requirements defined by USB BC 1.2 specification
and section 3.2.4 Detection Timing CDP.
During Primary Detection the i.MX device should turn on VDP_SRC and
IDM_SINK for a minimum of 40ms (TVDPSRC_ON). After a time of TVDPSRC_ON,
the i.MX is allowed to check the status of the D- line. Current
implementation is waiting between 1ms and 2ms, and certain BC 1.2
complaint USB HUBs cannot be detected. Increase delay to 40ms allowing
enough time for primary detection.
During secondary detection the i.MX is required to disable VDP_SRC and
IDM_SNK, and enable VDM_SRC and IDP_SINK for at least 40ms (TVDMSRC_ON).
Current implementation is not disabling VDP_SRC and IDM_SNK, introduce
disable sequence in imx7d_charger_secondary_detection() function.
VDM_SRC and IDP_SINK should be enabled for at least 40ms (TVDMSRC_ON).
Increase delay allowing enough time for detection.
Cc: <stable(a)vger.kernel.org>
Fixes: 746f316b753a ("usb: chipidea: introduce imx7d USB charger detection")
Signed-off-by: Breno Lima <breno.lima(a)nxp.com>
Signed-off-by: Jun Li <jun.li(a)nxp.com>
Link: https://lore.kernel.org/r/20210614175013.495808-1-breno.lima@nxp.com
Signed-off-by: Peter Chen <peter.chen(a)kernel.org>
---
drivers/usb/chipidea/usbmisc_imx.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/chipidea/usbmisc_imx.c b/drivers/usb/chipidea/usbmisc_imx.c
index 4545b23bda3f..bac0f5458cab 100644
--- a/drivers/usb/chipidea/usbmisc_imx.c
+++ b/drivers/usb/chipidea/usbmisc_imx.c
@@ -686,6 +686,16 @@ static int imx7d_charger_secondary_detection(struct imx_usbmisc_data *data)
int val;
unsigned long flags;
+ /* Clear VDATSRCENB0 to disable VDP_SRC and IDM_SNK required by BC 1.2 spec */
+ spin_lock_irqsave(&usbmisc->lock, flags);
+ val = readl(usbmisc->base + MX7D_USB_OTG_PHY_CFG2);
+ val &= ~MX7D_USB_OTG_PHY_CFG2_CHRG_VDATSRCENB0;
+ writel(val, usbmisc->base + MX7D_USB_OTG_PHY_CFG2);
+ spin_unlock_irqrestore(&usbmisc->lock, flags);
+
+ /* TVDMSRC_DIS */
+ msleep(20);
+
/* VDM_SRC is connected to D- and IDP_SINK is connected to D+ */
spin_lock_irqsave(&usbmisc->lock, flags);
val = readl(usbmisc->base + MX7D_USB_OTG_PHY_CFG2);
@@ -695,7 +705,8 @@ static int imx7d_charger_secondary_detection(struct imx_usbmisc_data *data)
usbmisc->base + MX7D_USB_OTG_PHY_CFG2);
spin_unlock_irqrestore(&usbmisc->lock, flags);
- usleep_range(1000, 2000);
+ /* TVDMSRC_ON */
+ msleep(40);
/*
* Per BC 1.2, check voltage of D+:
@@ -798,7 +809,8 @@ static int imx7d_charger_primary_detection(struct imx_usbmisc_data *data)
usbmisc->base + MX7D_USB_OTG_PHY_CFG2);
spin_unlock_irqrestore(&usbmisc->lock, flags);
- usleep_range(1000, 2000);
+ /* TVDPSRC_ON */
+ msleep(40);
/* Check if D- is less than VDAT_REF to determine an SDP per BC 1.2 */
val = readl(usbmisc->base + MX7D_USB_OTG_PHY_STATUS);
--
2.32.0
This is a note to let you know that I've just added the patch titled
serial_cs: Add Option International GSM-Ready 56K/ISDN modem
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From d495dd743d5ecd47288156e25c4d9163294a0992 Mon Sep 17 00:00:00 2001
From: Ondrej Zary <linux(a)zary.sk>
Date: Fri, 11 Jun 2021 22:19:40 +0200
Subject: serial_cs: Add Option International GSM-Ready 56K/ISDN modem
Add support for Option International GSM-Ready 56K/ISDN PCMCIA modem
card.
Signed-off-by: Ondrej Zary <linux(a)zary.sk>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210611201940.23898-2-linux@zary.sk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/serial_cs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/tty/serial/8250/serial_cs.c b/drivers/tty/serial/8250/serial_cs.c
index 2f1d33ea26e1..dc2ef05a10eb 100644
--- a/drivers/tty/serial/8250/serial_cs.c
+++ b/drivers/tty/serial/8250/serial_cs.c
@@ -786,6 +786,7 @@ static const struct pcmcia_device_id serial_ids[] = {
PCMCIA_DEVICE_PROD_ID12("Multi-Tech", "MT2834LT", 0x5f73be51, 0x4cd7c09e),
PCMCIA_DEVICE_PROD_ID12("OEM ", "C288MX ", 0xb572d360, 0xd2385b7a),
PCMCIA_DEVICE_PROD_ID12("Option International", "V34bis GSM/PSTN Data/Fax Modem", 0x9d7cd6f5, 0x5cb8bf41),
+ PCMCIA_DEVICE_PROD_ID12("Option International", "GSM-Ready 56K/ISDN", 0x9d7cd6f5, 0xb23844aa),
PCMCIA_DEVICE_PROD_ID12("PCMCIA ", "C336MX ", 0x99bcafe9, 0xaa25bcab),
PCMCIA_DEVICE_PROD_ID12("Quatech Inc", "PCMCIA Dual RS-232 Serial Port Card", 0xc4420b35, 0x92abc92f),
PCMCIA_DEVICE_PROD_ID12("Quatech Inc", "Dual RS-232 Serial Port PC Card", 0xc4420b35, 0x031a380d),
--
2.32.0
This is a note to let you know that I've just added the patch titled
serial_cs: remove wrong GLOBETROTTER.cis entry
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 11b1d881a90fc184cc7d06e9804eb288c24a2a0d Mon Sep 17 00:00:00 2001
From: Ondrej Zary <linux(a)zary.sk>
Date: Fri, 11 Jun 2021 22:19:39 +0200
Subject: serial_cs: remove wrong GLOBETROTTER.cis entry
The GLOBETROTTER.cis entry in serial_cs matches more devices than
intended and breaks them. Remove it.
Example: # pccardctl info
PRODID_1="Option International
"
PRODID_2="GSM-Ready 56K/ISDN
"
PRODID_3="021
"
PRODID_4="A
"
MANFID=0013,0000
FUNCID=0
result:
pcmcia 0.0: Direct firmware load for cis/GLOBETROTTER.cis failed with error -2
The GLOBETROTTER.cis is nowhere to be found. There's GLOBETROTTER.cis.ihex at
https://netdev.vger.kernel.narkive.com/h4inqdxM/patch-axnet-cs-fix-phy-id-d…
It's from completely diffetent card:
vers_1 4.1, "Option International", "GSM/GPRS GlobeTrotter", "001", "A"
Signed-off-by: Ondrej Zary <linux(a)zary.sk>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210611201940.23898-1-linux@zary.sk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/serial_cs.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/tty/serial/8250/serial_cs.c b/drivers/tty/serial/8250/serial_cs.c
index 3708114343b0..2f1d33ea26e1 100644
--- a/drivers/tty/serial/8250/serial_cs.c
+++ b/drivers/tty/serial/8250/serial_cs.c
@@ -813,7 +813,6 @@ static const struct pcmcia_device_id serial_ids[] = {
PCMCIA_DEVICE_CIS_PROD_ID12("ADVANTECH", "COMpad-32/85B-4", 0x96913a85, 0xcec8f102, "cis/COMpad4.cis"),
PCMCIA_DEVICE_CIS_PROD_ID123("ADVANTECH", "COMpad-32/85", "1.0", 0x96913a85, 0x8fbe92ae, 0x0877b627, "cis/COMpad2.cis"),
PCMCIA_DEVICE_CIS_PROD_ID2("RS-COM 2P", 0xad20b156, "cis/RS-COM-2P.cis"),
- PCMCIA_DEVICE_CIS_MANF_CARD(0x0013, 0x0000, "cis/GLOBETROTTER.cis"),
PCMCIA_DEVICE_PROD_ID12("ELAN DIGITAL SYSTEMS LTD, c1997.", "SERIAL CARD: SL100 1.00.", 0x19ca78af, 0xf964f42b),
PCMCIA_DEVICE_PROD_ID12("ELAN DIGITAL SYSTEMS LTD, c1997.", "SERIAL CARD: SL100", 0x19ca78af, 0x71d98e83),
PCMCIA_DEVICE_PROD_ID12("ELAN DIGITAL SYSTEMS LTD, c1997.", "SERIAL CARD: SL232 1.00.", 0x19ca78af, 0x69fb7490),
--
2.32.0
This is a note to let you know that I've just added the patch titled
serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 08a84410a04f05c7c1b8e833f552416d8eb9f6fe Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Thu, 10 Jun 2021 20:08:06 +0900
Subject: serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
Stop dmaengine transfer in sci_stop_tx(). Otherwise, the following
message is possible output when system enters suspend and while
transferring data, because clearing TIE bit in SCSCR is not able to
stop any dmaengine transfer.
sh-sci e6550000.serial: ttySC1: Unable to drain transmitter
Note that this driver has already used some #ifdef in the .c file
so that this patch also uses #ifdef to fix the issue. Otherwise,
build errors happens if the CONFIG_SERIAL_SH_SCI_DMA is disabled.
Fixes: 73a19e4c0301 ("serial: sh-sci: Add DMA support.")
Cc: <stable(a)vger.kernel.org> # v4.9+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/20210610110806.277932-1-yoshihiro.shimoda.uh@rene…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/sh-sci.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index df4f70716ba2..aabe66c99c1a 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -611,6 +611,14 @@ static void sci_stop_tx(struct uart_port *port)
ctrl &= ~SCSCR_TIE;
serial_port_out(port, SCSCR, ctrl);
+
+#ifdef CONFIG_SERIAL_SH_SCI_DMA
+ if (to_sci_port(port)->chan_tx &&
+ !dma_submit_error(to_sci_port(port)->cookie_tx)) {
+ dmaengine_terminate_async(to_sci_port(port)->chan_tx);
+ to_sci_port(port)->cookie_tx = -EINVAL;
+ }
+#endif
}
static void sci_start_rx(struct uart_port *port)
--
2.32.0
When a failure occurs in rproc_add() it returns an error, but does
not cleanup after itself. This change adds the failure path in such
cases.
Signed-off-by: Siddharth Gupta <sidgup(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index b874280..d823f70 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -2343,8 +2343,10 @@ int rproc_add(struct rproc *rproc)
return ret;
ret = device_add(dev);
- if (ret < 0)
- return ret;
+ if (ret < 0) {
+ put_device(dev);
+ goto rproc_remove_cdev;
+ }
dev_info(dev, "%s is available\n", rproc->name);
@@ -2355,7 +2357,7 @@ int rproc_add(struct rproc *rproc)
if (rproc->auto_boot) {
ret = rproc_trigger_auto_boot(rproc);
if (ret < 0)
- return ret;
+ goto rproc_remove_dev;
}
/* expose to rproc_get_by_phandle users */
@@ -2364,6 +2366,13 @@ int rproc_add(struct rproc *rproc)
mutex_unlock(&rproc_list_mutex);
return 0;
+
+rproc_remove_dev:
+ rproc_delete_debug_dir(rproc);
+ device_del(dev);
+rproc_remove_cdev:
+ rproc_char_device_remove(rproc);
+ return ret;
}
EXPORT_SYMBOL(rproc_add);
--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Commit bfb819ea20ce ("proc: Check /proc/$pid/attr/ writes against file opener")
tried to make sure that there could not be a confusion between the opener of
a /proc/$pid/attr/ file and the writer. It used struct cred to make sure
the privileges didn't change. However, there were existing cases where a more
privileged thread was passing the opened fd to a differently privileged thread
(during container setup). Instead, use mm_struct to track whether the opener
and writer are still the same process. (This is what several other proc files
already do, though for different reasons.)
Reported-by: Christian Brauner <christian.brauner(a)ubuntu.com>
Reported-by: Andrea Righi <andrea.righi(a)canonical.com>
Tested-by: Andrea Righi <andrea.righi(a)canonical.com>
Fixes: bfb819ea20ce ("proc: Check /proc/$pid/attr/ writes against file opener")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
fs/proc/base.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 58bbf334265b..7118ebe38fa6 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2674,6 +2674,11 @@ static int proc_pident_readdir(struct file *file, struct dir_context *ctx,
}
#ifdef CONFIG_SECURITY
+static int proc_pid_attr_open(struct inode *inode, struct file *file)
+{
+ return __mem_open(inode, file, PTRACE_MODE_READ_FSCREDS);
+}
+
static ssize_t proc_pid_attr_read(struct file * file, char __user * buf,
size_t count, loff_t *ppos)
{
@@ -2704,7 +2709,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
int rv;
/* A task may only write when it was the opener. */
- if (file->f_cred != current_real_cred())
+ if (file->private_data != current->mm)
return -EPERM;
rcu_read_lock();
@@ -2754,9 +2759,11 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
}
static const struct file_operations proc_pid_attr_operations = {
+ .open = proc_pid_attr_open,
.read = proc_pid_attr_read,
.write = proc_pid_attr_write,
.llseek = generic_file_llseek,
+ .release = mem_release,
};
#define LSM_DIR_OPS(LSM) \
--
2.25.1
This series has two fixes for FDMI.
Attributes length corrected for RHBA.
Fixed the wrong condition check in fc_ct_ms_fill_attr().
Kindly apply this series to scsi-queue at your earliest convenience.
Javed Hasan (2):
scsi: fc: Corrected RHBA attributes length
libfc: Corrected the condition check and invalid argument passed
drivers/scsi/libfc/fc_encode.h | 8 +++++---
include/scsi/fc/fc_ms.h | 4 ++--
2 files changed, 7 insertions(+), 5 deletions(-)
--
2.26.2
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
There is a race between THP unmapping and truncation, when truncate sees
pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it,
but before its page_remove_rmap() gets to decrement compound_mapcount:
generating false "BUG: Bad page cache" reports that the page is still
mapped when deleted. This commit fixes that, but not in the way I hoped.
The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
instead of unmap_mapping_range() in truncate_cleanup_page(): it has often
been an annoyance that we usually call unmap_mapping_range() with no pages
locked, but there apply it to a single locked page. try_to_unmap() looks
more suitable for a single locked page.
However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it
is used to insert THP migration entries, but not used to unmap THPs. Copy
zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are
different, I'm too ignorant of the DAX cases, and couldn't decide how far
to go for anon+swap. Set that aside.
The second attempt took a different tack: make no change in truncate.c,
but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
clearing it initially, then pmd_clear() between page_remove_rmap() and
unlocking at the end. Nice. But powerpc blows that approach out of the
water, with its serialize_against_pte_lookup(), and interesting pgtable
usage. It would need serious help to get working on powerpc (with a minor
optimization issue on s390 too). Set that aside.
Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
that's likely to reduce or eliminate the number of incidents, it would
give less assurance of whether we had identified the problem correctly.
This successful iteration introduces "unmap_mapping_page(page)" instead of
try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with
an addition to details. Then zap_pmd_range() watches for this case, and
does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now
does in the PVMW_SYNC case. Not pretty, but safe.
Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert
its interface; but currently that's only used to make sure that
page->mapping is stable, and zap_pmd_range() doesn't care if the page is
locked or not. Along these lines, in invalidate_inode_pages2_range() move
the initial unmap_mapping_range() out from under page lock, before then
calling unmap_mapping_page() under page lock if still mapped.
Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com
Fixes: fc127da085c2 ("truncate: handle file thp")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 3 +++
mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++
mm/truncate.c | 43 +++++++++++++++++++------------------------
3 files changed, 63 insertions(+), 24 deletions(-)
--- a/include/linux/mm.h~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page
+++ a/include/linux/mm.h
@@ -1719,6 +1719,7 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
+ struct page *single_page; /* Locked page to be unmapped */
};
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
@@ -1766,6 +1767,7 @@ extern vm_fault_t handle_mm_fault(struct
extern int fixup_user_fault(struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked);
+void unmap_mapping_page(struct page *page);
void unmap_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t nr, bool even_cows);
void unmap_mapping_range(struct address_space *mapping,
@@ -1786,6 +1788,7 @@ static inline int fixup_user_fault(struc
BUG();
return -EFAULT;
}
+static inline void unmap_mapping_page(struct page *page) { }
static inline void unmap_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t nr, bool even_cows) { }
static inline void unmap_mapping_range(struct address_space *mapping,
--- a/mm/memory.c~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page
+++ a/mm/memory.c
@@ -1361,7 +1361,18 @@ static inline unsigned long zap_pmd_rang
else if (zap_huge_pmd(tlb, vma, pmd, addr))
goto next;
/* fall through */
+ } else if (details && details->single_page &&
+ PageTransCompound(details->single_page) &&
+ next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) {
+ spinlock_t *ptl = pmd_lock(tlb->mm, pmd);
+ /*
+ * Take and drop THP pmd lock so that we cannot return
+ * prematurely, while zap_huge_pmd() has cleared *pmd,
+ * but not yet decremented compound_mapcount().
+ */
+ spin_unlock(ptl);
}
+
/*
* Here there can be other concurrent MADV_DONTNEED or
* trans huge page faults running, and if the pmd is
@@ -3237,6 +3248,36 @@ static inline void unmap_mapping_range_t
}
/**
+ * unmap_mapping_page() - Unmap single page from processes.
+ * @page: The locked page to be unmapped.
+ *
+ * Unmap this page from any userspace process which still has it mmaped.
+ * Typically, for efficiency, the range of nearby pages has already been
+ * unmapped by unmap_mapping_pages() or unmap_mapping_range(). But once
+ * truncation or invalidation holds the lock on a page, it may find that
+ * the page has been remapped again: and then uses unmap_mapping_page()
+ * to unmap it finally.
+ */
+void unmap_mapping_page(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct zap_details details = { };
+
+ VM_BUG_ON(!PageLocked(page));
+ VM_BUG_ON(PageTail(page));
+
+ details.check_mapping = mapping;
+ details.first_index = page->index;
+ details.last_index = page->index + thp_nr_pages(page) - 1;
+ details.single_page = page;
+
+ i_mmap_lock_write(mapping);
+ if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
+ unmap_mapping_range_tree(&mapping->i_mmap, &details);
+ i_mmap_unlock_write(mapping);
+}
+
+/**
* unmap_mapping_pages() - Unmap pages from processes.
* @mapping: The address space containing pages to be unmapped.
* @start: Index of first page to be unmapped.
--- a/mm/truncate.c~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page
+++ a/mm/truncate.c
@@ -167,13 +167,10 @@ void do_invalidatepage(struct page *page
* its lock, b) when a concurrent invalidate_mapping_pages got there first and
* c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
*/
-static void
-truncate_cleanup_page(struct address_space *mapping, struct page *page)
+static void truncate_cleanup_page(struct page *page)
{
- if (page_mapped(page)) {
- unsigned int nr = thp_nr_pages(page);
- unmap_mapping_pages(mapping, page->index, nr, false);
- }
+ if (page_mapped(page))
+ unmap_mapping_page(page);
if (page_has_private(page))
do_invalidatepage(page, 0, thp_size(page));
@@ -218,7 +215,7 @@ int truncate_inode_page(struct address_s
if (page->mapping != mapping)
return -EIO;
- truncate_cleanup_page(mapping, page);
+ truncate_cleanup_page(page);
delete_from_page_cache(page);
return 0;
}
@@ -325,7 +322,7 @@ void truncate_inode_pages_range(struct a
index = indices[pagevec_count(&pvec) - 1] + 1;
truncate_exceptional_pvec_entries(mapping, &pvec, indices);
for (i = 0; i < pagevec_count(&pvec); i++)
- truncate_cleanup_page(mapping, pvec.pages[i]);
+ truncate_cleanup_page(pvec.pages[i]);
delete_from_page_cache_batch(mapping, &pvec);
for (i = 0; i < pagevec_count(&pvec); i++)
unlock_page(pvec.pages[i]);
@@ -639,6 +636,16 @@ int invalidate_inode_pages2_range(struct
continue;
}
+ if (!did_range_unmap && page_mapped(page)) {
+ /*
+ * If page is mapped, before taking its lock,
+ * zap the rest of the file in one hit.
+ */
+ unmap_mapping_pages(mapping, index,
+ (1 + end - index), false);
+ did_range_unmap = 1;
+ }
+
lock_page(page);
WARN_ON(page_to_index(page) != index);
if (page->mapping != mapping) {
@@ -646,23 +653,11 @@ int invalidate_inode_pages2_range(struct
continue;
}
wait_on_page_writeback(page);
- if (page_mapped(page)) {
- if (!did_range_unmap) {
- /*
- * Zap the rest of the file in one hit.
- */
- unmap_mapping_pages(mapping, index,
- (1 + end - index), false);
- did_range_unmap = 1;
- } else {
- /*
- * Just zap this page
- */
- unmap_mapping_pages(mapping, index,
- 1, false);
- }
- }
+
+ if (page_mapped(page))
+ unmap_mapping_page(page);
BUG_ON(page_mapped(page));
+
ret2 = do_launder_page(mapping, page);
if (ret2 == 0) {
if (!invalidate_complete_page2(mapping, page))
_
From: Jue Wang <juew(a)google.com>
Subject: mm/thp: fix page_address_in_vma() on file THP tails
Anon THP tails were already supported, but memory-failure may need to use
page_address_in_vma() on file THP tails, which its page->mapping check did
not permit: fix it.
hughd adds: no current usage is known to hit the issue, but this does fix
a subtle trap in a general helper: best fixed in stable sooner than later.
Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Jue Wang <juew(a)google.com>
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/rmap.c~mm-thp-fix-page_address_in_vma-on-file-thp-tails
+++ a/mm/rmap.c
@@ -716,11 +716,11 @@ unsigned long page_address_in_vma(struct
if (!vma->anon_vma || !page__anon_vma ||
vma->anon_vma->root != page__anon_vma->root)
return -EFAULT;
- } else if (page->mapping) {
- if (!vma->vm_file || vma->vm_file->f_mapping != page->mapping)
- return -EFAULT;
- } else
+ } else if (!vma->vm_file) {
+ return -EFAULT;
+ } else if (vma->vm_file->f_mapping != compound_head(page)->mapping) {
return -EFAULT;
+ }
return vma_address(page, vma);
}
_
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: fix vma_address() if virtual address below file offset
Running certain tests with a DEBUG_VM kernel would crash within hours, on
the total_mapcount BUG() in split_huge_page_to_list(), while trying to
free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which, on
a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
When that BUG() was changed to a WARN(), it would later crash on the
VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in
mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap().
vma_address() is usually correct, but there's a wraparound case when the
vm_start address is unusually low, but vm_pgoff not so low: vma_address()
chooses max(start, vma->vm_start), but that decides on the wrong address,
because start has become almost ULONG_MAX.
Rewrite vma_address() to be more careful about vm_pgoff; move the
VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be
safely used from page_mapped_in_vma() and page_address_in_vma() too.
Add vma_address_end() to apply similar care to end address calculation, in
page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one();
though it raises a question of whether callers would do better to supply
pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.
An irritation is that their apparent generality breaks down on KSM pages,
which cannot be located by the page->index that page_to_pgoff() uses: as
4b0ece6fa016 ("mm: migrate: fix remove_migration_pte() for ksm pages")
once discovered. I dithered over the best thing to do about that, and
have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and
vma_address_end(); though the only place in danger of using it on them was
try_to_unmap_one().
Sidenote: vma_address() and vma_address_end() now use compound_nr() on a
head page, instead of thp_size(): to make the right calculation on a
hugetlbfs page, whether or not THPs are configured. try_to_unmap() is
used on hugetlbfs pages, but perhaps the wrong calculation never mattered.
Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com
Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/internal.h | 51 ++++++++++++++++++++++++++++++-----------
mm/page_vma_mapped.c | 16 ++++--------
mm/rmap.c | 16 ++++++------
3 files changed, 52 insertions(+), 31 deletions(-)
--- a/mm/internal.h~mm-thp-fix-vma_address-if-virtual-address-below-file-offset
+++ a/mm/internal.h
@@ -384,27 +384,52 @@ static inline void mlock_migrate_page(st
extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
/*
- * At what user virtual address is page expected in @vma?
+ * At what user virtual address is page expected in vma?
+ * Returns -EFAULT if all of the page is outside the range of vma.
+ * If page is a compound head, the entire compound page is considered.
*/
static inline unsigned long
-__vma_address(struct page *page, struct vm_area_struct *vma)
+vma_address(struct page *page, struct vm_area_struct *vma)
{
- pgoff_t pgoff = page_to_pgoff(page);
- return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ pgoff_t pgoff;
+ unsigned long address;
+
+ VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */
+ pgoff = page_to_pgoff(page);
+ if (pgoff >= vma->vm_pgoff) {
+ address = vma->vm_start +
+ ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ /* Check for address beyond vma (or wrapped through 0?) */
+ if (address < vma->vm_start || address >= vma->vm_end)
+ address = -EFAULT;
+ } else if (PageHead(page) &&
+ pgoff + compound_nr(page) - 1 >= vma->vm_pgoff) {
+ /* Test above avoids possibility of wrap to 0 on 32-bit */
+ address = vma->vm_start;
+ } else {
+ address = -EFAULT;
+ }
+ return address;
}
+/*
+ * Then at what user virtual address will none of the page be found in vma?
+ * Assumes that vma_address() already returned a good starting address.
+ * If page is a compound head, the entire compound page is considered.
+ */
static inline unsigned long
-vma_address(struct page *page, struct vm_area_struct *vma)
+vma_address_end(struct page *page, struct vm_area_struct *vma)
{
- unsigned long start, end;
-
- start = __vma_address(page, vma);
- end = start + thp_size(page) - PAGE_SIZE;
-
- /* page should be within @vma mapping range */
- VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
+ pgoff_t pgoff;
+ unsigned long address;
- return max(start, vma->vm_start);
+ VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */
+ pgoff = page_to_pgoff(page) + compound_nr(page);
+ address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ /* Check for address beyond vma (or wrapped through 0?) */
+ if (address < vma->vm_start || address > vma->vm_end)
+ address = vma->vm_end;
+ return address;
}
static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf,
--- a/mm/page_vma_mapped.c~mm-thp-fix-vma_address-if-virtual-address-below-file-offset
+++ a/mm/page_vma_mapped.c
@@ -228,18 +228,18 @@ restart:
if (!map_pte(pvmw))
goto next_pte;
while (1) {
+ unsigned long end;
+
if (check_pte(pvmw))
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
return not_found(pvmw);
+ end = vma_address_end(pvmw->page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
- if (pvmw->address >= pvmw->vma->vm_end ||
- pvmw->address >=
- __vma_address(pvmw->page, pvmw->vma) +
- thp_size(pvmw->page))
+ if (pvmw->address >= end)
return not_found(pvmw);
/* Did we cross page table boundary? */
if (pvmw->address % PMD_SIZE == 0) {
@@ -277,14 +277,10 @@ int page_mapped_in_vma(struct page *page
.vma = vma,
.flags = PVMW_SYNC,
};
- unsigned long start, end;
-
- start = __vma_address(page, vma);
- end = start + thp_size(page) - PAGE_SIZE;
- if (unlikely(end < vma->vm_start || start >= vma->vm_end))
+ pvmw.address = vma_address(page, vma);
+ if (pvmw.address == -EFAULT)
return 0;
- pvmw.address = max(start, vma->vm_start);
if (!page_vma_mapped_walk(&pvmw))
return 0;
page_vma_mapped_walk_done(&pvmw);
--- a/mm/rmap.c~mm-thp-fix-vma_address-if-virtual-address-below-file-offset
+++ a/mm/rmap.c
@@ -707,7 +707,6 @@ static bool should_defer_flush(struct mm
*/
unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
{
- unsigned long address;
if (PageAnon(page)) {
struct anon_vma *page__anon_vma = page_anon_vma(page);
/*
@@ -722,10 +721,8 @@ unsigned long page_address_in_vma(struct
return -EFAULT;
} else
return -EFAULT;
- address = __vma_address(page, vma);
- if (unlikely(address < vma->vm_start || address >= vma->vm_end))
- return -EFAULT;
- return address;
+
+ return vma_address(page, vma);
}
pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
@@ -919,7 +916,7 @@ static bool page_mkclean_one(struct page
*/
mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
0, vma, vma->vm_mm, address,
- min(vma->vm_end, address + page_size(page)));
+ vma_address_end(page, vma));
mmu_notifier_invalidate_range_start(&range);
while (page_vma_mapped_walk(&pvmw)) {
@@ -1435,9 +1432,10 @@ static bool try_to_unmap_one(struct page
* Note that the page can not be free in this function as call of
* try_to_unmap() must hold a reference on the page.
*/
+ range.end = PageKsm(page) ?
+ address + PAGE_SIZE : vma_address_end(page, vma);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
- address,
- min(vma->vm_end, address + page_size(page)));
+ address, range.end);
if (PageHuge(page)) {
/*
* If sharing is possible, start and end will be adjusted
@@ -1889,6 +1887,7 @@ static void rmap_walk_anon(struct page *
struct vm_area_struct *vma = avc->vma;
unsigned long address = vma_address(page, vma);
+ VM_BUG_ON_VMA(address == -EFAULT, vma);
cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
@@ -1943,6 +1942,7 @@ static void rmap_walk_file(struct page *
pgoff_start, pgoff_end) {
unsigned long address = vma_address(page, vma);
+ VM_BUG_ON_VMA(address == -EFAULT, vma);
cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
_
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
(!unmap_success): with dump_page() showing mapcount:1, but then its raw
struct page output showing _mapcount ffffffff i.e. mapcount 0.
And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it
is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and
further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all
indicative of some mapcount difficulty in development here perhaps. But
the !CONFIG_DEBUG_VM path handles the failures correctly and silently.
I believe the problem is that once a racing unmap has cleared pte or pmd,
try_to_unmap_one() may skip taking the page table lock, and emerge from
try_to_unmap() before the racing task has reached decrementing mapcount.
Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC
to the options, and passing that from unmap_page().
When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same
for both: the slight overhead added should rarely matter, except perhaps
if splitting sparsely-populated multiply-mapped shmem. Once confident
that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated.
Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com
Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/rmap.h | 1 +
mm/huge_memory.c | 2 +-
mm/page_vma_mapped.c | 11 +++++++++++
mm/rmap.c | 17 ++++++++++++++++-
4 files changed, 29 insertions(+), 2 deletions(-)
--- a/include/linux/rmap.h~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting
+++ a/include/linux/rmap.h
@@ -91,6 +91,7 @@ enum ttu_flags {
TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */
TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */
+ TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */
TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */
TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible
* and caller guarantees they will
--- a/mm/huge_memory.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting
+++ a/mm/huge_memory.c
@@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_are
static void unmap_page(struct page *page)
{
- enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
+ enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC |
TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
bool unmap_success;
--- a/mm/page_vma_mapped.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting
+++ a/mm/page_vma_mapped.c
@@ -212,6 +212,17 @@ restart:
pvmw->ptl = NULL;
}
} else if (!pmd_present(pmde)) {
+ /*
+ * If PVMW_SYNC, take and drop THP pmd lock so that we
+ * cannot return prematurely, while zap_huge_pmd() has
+ * cleared *pmd but not decremented compound_mapcount().
+ */
+ if ((pvmw->flags & PVMW_SYNC) &&
+ PageTransCompound(pvmw->page)) {
+ spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+
+ spin_unlock(ptl);
+ }
return false;
}
if (!map_pte(pvmw))
--- a/mm/rmap.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting
+++ a/mm/rmap.c
@@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page
struct mmu_notifier_range range;
enum ttu_flags flags = (enum ttu_flags)(long)arg;
+ /*
+ * When racing against e.g. zap_pte_range() on another cpu,
+ * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * try_to_unmap() may return false when it is about to become true,
+ * if page table locking is skipped: use TTU_SYNC to wait for that.
+ */
+ if (flags & TTU_SYNC)
+ pvmw.flags = PVMW_SYNC;
+
/* munlock has nothing to gain from examining un-locked vmas */
if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
return true;
@@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enu
else
rmap_walk(page, &rwc);
- return !page_mapcount(page) ? true : false;
+ /*
+ * When racing against e.g. zap_pte_range() on another cpu,
+ * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * try_to_unmap() may return false when it is about to become true,
+ * if page table locking is skipped: use TTU_SYNC to wait for that.
+ */
+ return !page_mapcount(page);
}
/**
_
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: make is_huge_zero_pmd() safe and quicker
Most callers of is_huge_zero_pmd() supply a pmd already verified present;
but a few (notably zap_huge_pmd()) do not - it might be a pmd migration
entry, in which the pfn is encoded differently from a present pmd: which
might pass the is_huge_zero_pmd() test (though not on x86, since L1TF
forced us to protect against that); or perhaps even crash in pmd_page()
applied to a swap-like entry.
Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself;
and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd()
will not need to do that pmd_page() lookup each time.
__split_huge_pmd_locked() checked pmd_trans_huge() before: that worked,
but is unnecessary now that is_huge_zero_pmd() checks present.
Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com
Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/huge_mm.h | 8 +++++++-
mm/huge_memory.c | 5 ++++-
2 files changed, 11 insertions(+), 2 deletions(-)
--- a/include/linux/huge_mm.h~mm-thp-make-is_huge_zero_pmd-safe-and-quicker
+++ a/include/linux/huge_mm.h
@@ -286,6 +286,7 @@ struct page *follow_devmap_pud(struct vm
vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd);
extern struct page *huge_zero_page;
+extern unsigned long huge_zero_pfn;
static inline bool is_huge_zero_page(struct page *page)
{
@@ -294,7 +295,7 @@ static inline bool is_huge_zero_page(str
static inline bool is_huge_zero_pmd(pmd_t pmd)
{
- return is_huge_zero_page(pmd_page(pmd));
+ return READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd) && pmd_present(pmd);
}
static inline bool is_huge_zero_pud(pud_t pud)
@@ -439,6 +440,11 @@ static inline bool is_huge_zero_page(str
{
return false;
}
+
+static inline bool is_huge_zero_pmd(pmd_t pmd)
+{
+ return false;
+}
static inline bool is_huge_zero_pud(pud_t pud)
{
--- a/mm/huge_memory.c~mm-thp-make-is_huge_zero_pmd-safe-and-quicker
+++ a/mm/huge_memory.c
@@ -62,6 +62,7 @@ static struct shrinker deferred_split_sh
static atomic_t huge_zero_refcount;
struct page *huge_zero_page __read_mostly;
+unsigned long huge_zero_pfn __read_mostly = ~0UL;
bool transparent_hugepage_enabled(struct vm_area_struct *vma)
{
@@ -98,6 +99,7 @@ retry:
__free_pages(zero_page, compound_order(zero_page));
goto retry;
}
+ WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
/* We take additional reference here. It will be put back by shrinker */
atomic_set(&huge_zero_refcount, 2);
@@ -147,6 +149,7 @@ static unsigned long shrink_huge_zero_pa
if (atomic_cmpxchg(&huge_zero_refcount, 1, 0) == 1) {
struct page *zero_page = xchg(&huge_zero_page, NULL);
BUG_ON(zero_page == NULL);
+ WRITE_ONCE(huge_zero_pfn, ~0UL);
__free_pages(zero_page, compound_order(zero_page));
return HPAGE_PMD_NR;
}
@@ -2071,7 +2074,7 @@ static void __split_huge_pmd_locked(stru
return;
}
- if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
+ if (is_huge_zero_pmd(*pmd)) {
/*
* FIXME: Do we want to invalidate secondary mmu by calling
* mmu_notifier_invalidate_range() see comments below inside
_
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.
Here is v2 batch of long-standing THP bug fixes that I had not got around
to sending before, but prompted now by Wang Yugui's report
https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and they
have done no harm, but have *not* fixed that issue: something more is
needed and I have no idea of what.
This patch (of 7):
Stressing huge tmpfs page migration racing hole punch often crashed on the
VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y
kernel; or shortly afterwards, on a bad dereference in
__split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd
migration entries in the non-anonymous case.
Full disclosure: those particular experiments were on a kernel with more
relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the
vanilla kernel: it is conceivable that stricter locking happens to avoid
those cases, or makes them less likely; but __split_huge_pmd_locked()
already allowed for pmd migration entries when handling anonymous THPs, so
this commit brings the shmem and file THP handling into line.
And while there: use old_pmd rather than _pmd, as in the following blocks;
and make it clearer to the eye that the !vma_is_anonymous() block is
self-contained, making an early return after accounting for unmapping.
Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com
Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com
Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Jue Wang <juew(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 27 ++++++++++++++++++---------
mm/pgtable-generic.c | 5 ++---
2 files changed, 20 insertions(+), 12 deletions(-)
--- a/mm/huge_memory.c~mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry
+++ a/mm/huge_memory.c
@@ -2044,7 +2044,7 @@ static void __split_huge_pmd_locked(stru
count_vm_event(THP_SPLIT_PMD);
if (!vma_is_anonymous(vma)) {
- _pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);
+ old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);
/*
* We are going to unmap this huge page. So
* just go ahead and zap it
@@ -2053,16 +2053,25 @@ static void __split_huge_pmd_locked(stru
zap_deposited_table(mm, pmd);
if (vma_is_special_huge(vma))
return;
- page = pmd_page(_pmd);
- if (!PageDirty(page) && pmd_dirty(_pmd))
- set_page_dirty(page);
- if (!PageReferenced(page) && pmd_young(_pmd))
- SetPageReferenced(page);
- page_remove_rmap(page, true);
- put_page(page);
+ if (unlikely(is_pmd_migration_entry(old_pmd))) {
+ swp_entry_t entry;
+
+ entry = pmd_to_swp_entry(old_pmd);
+ page = migration_entry_to_page(entry);
+ } else {
+ page = pmd_page(old_pmd);
+ if (!PageDirty(page) && pmd_dirty(old_pmd))
+ set_page_dirty(page);
+ if (!PageReferenced(page) && pmd_young(old_pmd))
+ SetPageReferenced(page);
+ page_remove_rmap(page, true);
+ put_page(page);
+ }
add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
return;
- } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
+ }
+
+ if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
/*
* FIXME: Do we want to invalidate secondary mmu by calling
* mmu_notifier_invalidate_range() see comments below inside
--- a/mm/pgtable-generic.c~mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry
+++ a/mm/pgtable-generic.c
@@ -135,9 +135,8 @@ pmd_t pmdp_huge_clear_flush(struct vm_ar
{
pmd_t pmd;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
- VM_BUG_ON(!pmd_present(*pmdp));
- /* Below assumes pmd_present() is true */
- VM_BUG_ON(!pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
+ VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) &&
+ !pmd_devmap(*pmdp));
pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return pmd;
_
From: Xu Yu <xuyu(a)linux.alibaba.com>
Subject: mm, thp: use head page in __migration_entry_wait()
We notice that hung task happens in a corner but practical scenario when
CONFIG_PREEMPT_NONE is enabled, as follows.
Process 0 Process 1 Process 2..Inf
split_huge_page_to_list
unmap_page
split_huge_pmd_address
__migration_entry_wait(head)
__migration_entry_wait(tail)
remap_page (roll back)
remove_migration_ptes
rmap_walk_anon
cond_resched
Where __migration_entry_wait(tail) is occurred in kernel space, e.g.,
copy_to_user in fstat, which will immediately fault again without
rescheduling, and thus occupy the cpu fully.
When there are too many processes performing __migration_entry_wait on
tail page, remap_page will never be done after cond_resched.
This makes __migration_entry_wait operate on the compound head page, thus
waits for remap_page to complete, whether the THP is split successfully or
roll back.
Note that put_and_wait_on_page_locked helps to drop the page reference
acquired with get_page_unless_zero, as soon as the page is on the wait
queue, before actually waiting. So splitting the THP is only prevented
for a brief interval.
Link: https://lkml.kernel.org/r/b9836c1dd522e903891760af9f0c86a2cce987eb.16231440…
Fixes: ba98828088ad ("thp: add option to setup migration entries during PMD split")
Suggested-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Gang Deng <gavin.dg(a)linux.alibaba.com>
Signed-off-by: Xu Yu <xuyu(a)linux.alibaba.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/migrate.c~mm-thp-use-head-page-in-__migration_entry_wait
+++ a/mm/migrate.c
@@ -295,6 +295,7 @@ void __migration_entry_wait(struct mm_st
goto out;
page = migration_entry_to_page(entry);
+ page = compound_head(page);
/*
* Once page cache replacement of page migration started, page_count
_
From: Pingfan Liu <kernelfans(a)gmail.com>
Subject: crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo
As mentioned in kernel commit 1d50e5d0c505 ("crash_core, vmcoreinfo:
Append 'MAX_PHYSMEM_BITS' to vmcoreinfo"), SECTION_SIZE_BITS in the
formula:
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
Besides SECTIONS_SHIFT, SECTION_SIZE_BITS is also used to calculate
PAGES_PER_SECTION in makedumpfile just like kernel.
Unfortunately, this arch-dependent macro SECTION_SIZE_BITS changes, e.g.
recently in kernel commit f0b13ee23241 ("arm64/sparsemem: reduce
SECTION_SIZE_BITS"). But user space wants a stable interface to get this
info. Such info is impossible to be deduced from a crashdump vmcore.
Hence append SECTION_SIZE_BITS to vmcoreinfo.
Link: https://lkml.kernel.org/r/20210608103359.84907-1-kernelfans@gmail.com
Link: http://lists.infradead.org/pipermail/kexec/2021-June/022676.html
Signed-off-by: Pingfan Liu <kernelfans(a)gmail.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Bhupesh Sharma <bhupesh.sharma(a)linaro.org>
Cc: Kazuhito Hagio <k-hagio(a)ab.jp.nec.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Boris Petkov <bp(a)alien8.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: James Morse <james.morse(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Paul Mackerras <paulus(a)samba.org>
Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Cc: Dave Anderson <anderson(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/crash_core.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/crash_core.c~crash_core-vmcoreinfo-append-section_size_bits-to-vmcoreinfo
+++ a/kernel/crash_core.c
@@ -464,6 +464,7 @@ static int __init crash_save_vmcoreinfo_
VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
VMCOREINFO_STRUCT_SIZE(mem_section);
VMCOREINFO_OFFSET(mem_section, section_mem_map);
+ VMCOREINFO_NUMBER(SECTION_SIZE_BITS);
VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
#endif
VMCOREINFO_STRUCT_SIZE(page);
_
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: mm/hugetlb: expand restore_reserve_on_error functionality
The routine restore_reserve_on_error is called to restore reservation
information when an error occurs after page allocation. The routine
alloc_huge_page modifies the mapping reserve map and potentially the
reserve count during allocation. If code calling alloc_huge_page
encounters an error after allocation and needs to free the page, the
reservation information needs to be adjusted.
Currently, restore_reserve_on_error only takes action on pages for which
the reserve count was adjusted(HPageRestoreReserve flag). There is
nothing wrong with these adjustments. However, alloc_huge_page ALWAYS
modifies the reserve map during allocation even if the reserve count is
not adjusted. This can cause issues as observed during development of
this patch [1].
One specific series of operations causing an issue is:
- Create a shared hugetlb mapping
Reservations for all pages created by default
- Fault in a page in the mapping
Reservation exists so reservation count is decremented
- Punch a hole in the file/mapping at index previously faulted
Reservation and any associated pages will be removed
- Allocate a page to fill the hole
No reservation entry, so reserve count unmodified
Reservation entry added to map by alloc_huge_page
- Error after allocation and before instantiating the page
Reservation entry remains in map
- Allocate a page to fill the hole
Reservation entry exists, so decrement reservation count
This will cause a reservation count underflow as the reservation count was
decremented twice for the same index.
A user would observe a very large number for HugePages_Rsvd in
/proc/meminfo. This would also likely cause subsequent allocations of
hugetlb pages to fail as it would 'appear' that all pages are reserved.
This sequence of operations is unlikely to happen, however they were
easily reproduced and observed using hacked up code as described in [1].
Address the issue by having the routine restore_reserve_on_error take
action on pages where HPageRestoreReserve is not set. In this case, we
need to remove any reserve map entry created by alloc_huge_page. A new
helper routine vma_del_reservation assists with this operation.
There are three callers of alloc_huge_page which do not currently call
restore_reserve_on error before freeing a page on error paths. Add those
missing calls.
[1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.…
Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com
Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths"
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Mina Almasry <almasrymina(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 1
include/linux/hugetlb.h | 2
mm/hugetlb.c | 122 ++++++++++++++++++++++++++++++--------
3 files changed, 101 insertions(+), 24 deletions(-)
--- a/fs/hugetlbfs/inode.c~mm-hugetlb-expand-restore_reserve_on_error-functionality
+++ a/fs/hugetlbfs/inode.c
@@ -735,6 +735,7 @@ static long hugetlbfs_fallocate(struct f
__SetPageUptodate(page);
error = huge_add_to_page_cache(page, mapping, index);
if (unlikely(error)) {
+ restore_reserve_on_error(h, &pseudo_vma, addr, page);
put_page(page);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
goto out;
--- a/include/linux/hugetlb.h~mm-hugetlb-expand-restore_reserve_on_error-functionality
+++ a/include/linux/hugetlb.h
@@ -610,6 +610,8 @@ struct page *alloc_huge_page_vma(struct
unsigned long address);
int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
+void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
+ unsigned long address, struct page *page);
/* arch callback */
int __init __alloc_bootmem_huge_page(struct hstate *h);
--- a/mm/hugetlb.c~mm-hugetlb-expand-restore_reserve_on_error-functionality
+++ a/mm/hugetlb.c
@@ -2121,12 +2121,18 @@ out:
* be restored when a newly allocated huge page must be freed. It is
* to be called after calling vma_needs_reservation to determine if a
* reservation exists.
+ *
+ * vma_del_reservation is used in error paths where an entry in the reserve
+ * map was created during huge page allocation and must be removed. It is to
+ * be called after calling vma_needs_reservation to determine if a reservation
+ * exists.
*/
enum vma_resv_mode {
VMA_NEEDS_RESV,
VMA_COMMIT_RESV,
VMA_END_RESV,
VMA_ADD_RESV,
+ VMA_DEL_RESV,
};
static long __vma_reservation_common(struct hstate *h,
struct vm_area_struct *vma, unsigned long addr,
@@ -2170,11 +2176,21 @@ static long __vma_reservation_common(str
ret = region_del(resv, idx, idx + 1);
}
break;
+ case VMA_DEL_RESV:
+ if (vma->vm_flags & VM_MAYSHARE) {
+ region_abort(resv, idx, idx + 1, 1);
+ ret = region_del(resv, idx, idx + 1);
+ } else {
+ ret = region_add(resv, idx, idx + 1, 1, NULL, NULL);
+ /* region_add calls of range 1 should never fail. */
+ VM_BUG_ON(ret < 0);
+ }
+ break;
default:
BUG();
}
- if (vma->vm_flags & VM_MAYSHARE)
+ if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV)
return ret;
/*
* We know private mapping must have HPAGE_RESV_OWNER set.
@@ -2222,25 +2238,39 @@ static long vma_add_reservation(struct h
return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV);
}
+static long vma_del_reservation(struct hstate *h,
+ struct vm_area_struct *vma, unsigned long addr)
+{
+ return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV);
+}
+
/*
- * This routine is called to restore a reservation on error paths. In the
- * specific error paths, a huge page was allocated (via alloc_huge_page)
- * and is about to be freed. If a reservation for the page existed,
- * alloc_huge_page would have consumed the reservation and set
- * HPageRestoreReserve in the newly allocated page. When the page is freed
- * via free_huge_page, the global reservation count will be incremented if
- * HPageRestoreReserve is set. However, free_huge_page can not adjust the
- * reserve map. Adjust the reserve map here to be consistent with global
- * reserve count adjustments to be made by free_huge_page.
- */
-static void restore_reserve_on_error(struct hstate *h,
- struct vm_area_struct *vma, unsigned long address,
- struct page *page)
+ * This routine is called to restore reservation information on error paths.
+ * It should ONLY be called for pages allocated via alloc_huge_page(), and
+ * the hugetlb mutex should remain held when calling this routine.
+ *
+ * It handles two specific cases:
+ * 1) A reservation was in place and the page consumed the reservation.
+ * HPageRestoreReserve is set in the page.
+ * 2) No reservation was in place for the page, so HPageRestoreReserve is
+ * not set. However, alloc_huge_page always updates the reserve map.
+ *
+ * In case 1, free_huge_page later in the error path will increment the
+ * global reserve count. But, free_huge_page does not have enough context
+ * to adjust the reservation map. This case deals primarily with private
+ * mappings. Adjust the reserve map here to be consistent with global
+ * reserve count adjustments to be made by free_huge_page. Make sure the
+ * reserve map indicates there is a reservation present.
+ *
+ * In case 2, simply undo reserve map modifications done by alloc_huge_page.
+ */
+void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
+ unsigned long address, struct page *page)
{
- if (unlikely(HPageRestoreReserve(page))) {
- long rc = vma_needs_reservation(h, vma, address);
+ long rc = vma_needs_reservation(h, vma, address);
- if (unlikely(rc < 0)) {
+ if (HPageRestoreReserve(page)) {
+ if (unlikely(rc < 0))
/*
* Rare out of memory condition in reserve map
* manipulation. Clear HPageRestoreReserve so that
@@ -2253,16 +2283,57 @@ static void restore_reserve_on_error(str
* accounting of reserve counts.
*/
ClearHPageRestoreReserve(page);
- } else if (rc) {
- rc = vma_add_reservation(h, vma, address);
- if (unlikely(rc < 0))
+ else if (rc)
+ (void)vma_add_reservation(h, vma, address);
+ else
+ vma_end_reservation(h, vma, address);
+ } else {
+ if (!rc) {
+ /*
+ * This indicates there is an entry in the reserve map
+ * added by alloc_huge_page. We know it was added
+ * before the alloc_huge_page call, otherwise
+ * HPageRestoreReserve would be set on the page.
+ * Remove the entry so that a subsequent allocation
+ * does not consume a reservation.
+ */
+ rc = vma_del_reservation(h, vma, address);
+ if (rc < 0)
/*
- * See above comment about rare out of
- * memory condition.
+ * VERY rare out of memory condition. Since
+ * we can not delete the entry, set
+ * HPageRestoreReserve so that the reserve
+ * count will be incremented when the page
+ * is freed. This reserve will be consumed
+ * on a subsequent allocation.
*/
- ClearHPageRestoreReserve(page);
+ SetHPageRestoreReserve(page);
+ } else if (rc < 0) {
+ /*
+ * Rare out of memory condition from
+ * vma_needs_reservation call. Memory allocation is
+ * only attempted if a new entry is needed. Therefore,
+ * this implies there is not an entry in the
+ * reserve map.
+ *
+ * For shared mappings, no entry in the map indicates
+ * no reservation. We are done.
+ */
+ if (!(vma->vm_flags & VM_MAYSHARE))
+ /*
+ * For private mappings, no entry indicates
+ * a reservation is present. Since we can
+ * not add an entry, set SetHPageRestoreReserve
+ * on the page so reserve count will be
+ * incremented when freed. This reserve will
+ * be consumed on a subsequent allocation.
+ */
+ SetHPageRestoreReserve(page);
} else
- vma_end_reservation(h, vma, address);
+ /*
+ * No reservation present, do nothing
+ */
+ vma_end_reservation(h, vma, address);
}
}
@@ -4037,6 +4108,8 @@ again:
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
entry = huge_ptep_get(src_pte);
if (!pte_same(src_pte_old, entry)) {
+ restore_reserve_on_error(h, vma, addr,
+ new);
put_page(new);
/* dst_entry won't change as in child */
goto again;
@@ -5006,6 +5079,7 @@ out_release_unlock:
if (vm_shared || is_continue)
unlock_page(page);
out_release_nounlock:
+ restore_reserve_on_error(h, dst_vma, dst_addr, page);
put_page(page);
goto out;
}
_
From: Kees Cook <keescook(a)chromium.org>
Subject: mm/slub: actually fix freelist pointer vs redzoning
It turns out that SLUB redzoning ("slub_debug=Z") checks from
s->object_size rather than from s->inuse (which is normally bumped to make
room for the freelist pointer), so a cache created with an object size
less than 24 would have the freelist pointer written beyond
s->object_size, causing the redzone to be corrupted by the freelist
pointer. This was very visible with "slub_debug=ZF":
BUG test (Tainted: G B ): Right Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........
Object (____ptrval____): 00 00 00 00 00 f6 f4 a5 ........
Redzone (____ptrval____): 40 1d e8 1a aa @....
Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
Adjust the offset to stay within s->object_size.
(Note that no caches of in this size range are known to exist in the
kernel currently.)
Link: https://lkml.kernel.org/r/20210608183955.280836-4-keescook@chromium.org
Link: https://lore.kernel.org/linux-mm/20200807160627.GA1420741@elver.google.com/
Link: https://lore.kernel.org/lkml/0f7dd7b2-7496-5e2d-9488-2ec9f8e90441@suse.cz/F…: 89b83f282d8b (slub: avoid redzone when choosing freepointer location)
Link: https://lore.kernel.org/lkml/CANpmjNOwZ5VpKQn+SYWovTkFB4VsT-RPwyENBmaK0dLcp…
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Reported-by: Marco Elver <elver(a)google.com>
Reported-by: "Lin, Zhenpeng" <zplin(a)psu.edu>
Tested-by: Marco Elver <elver(a)google.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slub.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
--- a/mm/slub.c~mm-slub-actually-fix-freelist-pointer-vs-redzoning
+++ a/mm/slub.c
@@ -3689,7 +3689,6 @@ static int calculate_sizes(struct kmem_c
{
slab_flags_t flags = s->flags;
unsigned int size = s->object_size;
- unsigned int freepointer_area;
unsigned int order;
/*
@@ -3698,13 +3697,6 @@ static int calculate_sizes(struct kmem_c
* the possible location of the free pointer.
*/
size = ALIGN(size, sizeof(void *));
- /*
- * This is the area of the object where a freepointer can be
- * safely written. If redzoning adds more to the inuse size, we
- * can't use that portion for writing the freepointer, so
- * s->offset must be limited within this for the general case.
- */
- freepointer_area = size;
#ifdef CONFIG_SLUB_DEBUG
/*
@@ -3730,7 +3722,7 @@ static int calculate_sizes(struct kmem_c
/*
* With that we have determined the number of bytes in actual use
- * by the object. This is the potential offset to the free pointer.
+ * by the object and redzoning.
*/
s->inuse = size;
@@ -3753,13 +3745,13 @@ static int calculate_sizes(struct kmem_c
*/
s->offset = size;
size += sizeof(void *);
- } else if (freepointer_area > sizeof(void *)) {
+ } else {
/*
* Store freelist pointer near middle of object to keep
* it away from the edges of the object to avoid small
* sized over/underflows from neighboring allocations.
*/
- s->offset = ALIGN(freepointer_area / 2, sizeof(void *));
+ s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));
}
#ifdef CONFIG_SLUB_DEBUG
_
From: Kees Cook <keescook(a)chromium.org>
Subject: mm/slub: fix redzoning for small allocations
The redzone area for SLUB exists between s->object_size and s->inuse
(which is at least the word-aligned object_size). If a cache were created
with an object_size smaller than sizeof(void *), the in-object stored
freelist pointer would overwrite the redzone (e.g. with boot param
"slub_debug=ZF"):
BUG test (Tainted: G B ): Right Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........
Object (____ptrval____): f6 f4 a5 40 1d e8 ...@..
Redzone (____ptrval____): 1a aa ..
Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
Store the freelist pointer out of line when object_size is smaller than
sizeof(void *) and redzoning is enabled.
Additionally remove the "smaller than sizeof(void *)" check under
CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant:
SLAB and SLOB both handle small sizes.
(Note that no caches within this size range are known to exist in the
kernel currently.)
Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org
Fixes: 81819f0fc828 ("SLUB core")
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: "Lin, Zhenpeng" <zplin(a)psu.edu>
Cc: Marco Elver <elver(a)google.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab_common.c | 3 +--
mm/slub.c | 8 +++++---
2 files changed, 6 insertions(+), 5 deletions(-)
--- a/mm/slab_common.c~mm-slub-fix-redzoning-for-small-allocations
+++ a/mm/slab_common.c
@@ -97,8 +97,7 @@ EXPORT_SYMBOL(kmem_cache_size);
#ifdef CONFIG_DEBUG_VM
static int kmem_cache_sanity_check(const char *name, unsigned int size)
{
- if (!name || in_interrupt() || size < sizeof(void *) ||
- size > KMALLOC_MAX_SIZE) {
+ if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) {
pr_err("kmem_cache_create(%s) integrity check failed\n", name);
return -EINVAL;
}
--- a/mm/slub.c~mm-slub-fix-redzoning-for-small-allocations
+++ a/mm/slub.c
@@ -3734,15 +3734,17 @@ static int calculate_sizes(struct kmem_c
*/
s->inuse = size;
- if (((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) ||
- s->ctor)) {
+ if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) ||
+ ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) ||
+ s->ctor) {
/*
* Relocate free pointer after the object if it is not
* permitted to overwrite the first word of the object on
* kmem_cache_free.
*
* This is the case if we do RCU, have a constructor or
- * destructor or are poisoning the objects.
+ * destructor, are poisoning the objects, or are
+ * redzoning an object smaller than sizeof(void *).
*
* The assumption that s->offset >= s->inuse means free
* pointer is outside of the object is used in the
_
From: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Subject: mm,hwpoison: fix race with hugetlb page allocation
When hugetlb page fault (under overcommitting situation) and
memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following
race:
CPU0: CPU1:
gather_surplus_pages()
page = alloc_surplus_huge_page()
memory_failure_hugetlb()
get_hwpoison_page(page)
__get_hwpoison_page(page)
get_page_unless_zero(page)
zero = put_page_testzero(page)
VM_BUG_ON_PAGE(!zero, page)
enqueue_huge_page(h, page)
put_page(page)
__get_hwpoison_page() only checks the page refcount before taking an
additional one for memory error handling, which is not enough because
there's a time window where compound pages have non-zero refcount during
hugetlb page initialization.
So make __get_hwpoison_page() check page status a bit more for hugetlb
pages with get_hwpoison_huge_page(). Checking hugetlb-specific flags
under hugetlb_lock makes sure that the hugetlb page is not transitive.
It's notable that another new function, HWPoisonHandlable(), is helpful to
prevent a race against other transitive page states (like a generic
compound page just before PageHuge becomes true).
Link: https://lkml.kernel.org/r/20210603233632.2964832-2-nao.horiguchi@gmail.com
Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: <stable(a)vger.kernel.org> [5.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 6 ++++++
mm/hugetlb.c | 15 +++++++++++++++
mm/memory-failure.c | 29 +++++++++++++++++++++++++++--
3 files changed, 48 insertions(+), 2 deletions(-)
--- a/include/linux/hugetlb.h~mmhwpoison-fix-race-with-hugetlb-page-allocation
+++ a/include/linux/hugetlb.h
@@ -149,6 +149,7 @@ bool hugetlb_reserve_pages(struct inode
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
long freed);
bool isolate_huge_page(struct page *page, struct list_head *list);
+int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
void putback_active_hugepage(struct page *page);
void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
void free_huge_page(struct page *page);
@@ -339,6 +340,11 @@ static inline bool isolate_huge_page(str
return false;
}
+static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
+{
+ return 0;
+}
+
static inline void putback_active_hugepage(struct page *page)
{
}
--- a/mm/hugetlb.c~mmhwpoison-fix-race-with-hugetlb-page-allocation
+++ a/mm/hugetlb.c
@@ -5857,6 +5857,21 @@ unlock:
return ret;
}
+int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
+{
+ int ret = 0;
+
+ *hugetlb = false;
+ spin_lock_irq(&hugetlb_lock);
+ if (PageHeadHuge(page)) {
+ *hugetlb = true;
+ if (HPageFreed(page) || HPageMigratable(page))
+ ret = get_page_unless_zero(page);
+ }
+ spin_unlock_irq(&hugetlb_lock);
+ return ret;
+}
+
void putback_active_hugepage(struct page *page)
{
spin_lock_irq(&hugetlb_lock);
--- a/mm/memory-failure.c~mmhwpoison-fix-race-with-hugetlb-page-allocation
+++ a/mm/memory-failure.c
@@ -949,6 +949,17 @@ static int page_action(struct page_state
return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
}
+/*
+ * Return true if a page type of a given page is supported by hwpoison
+ * mechanism (while handling could fail), otherwise false. This function
+ * does not return true for hugetlb or device memory pages, so it's assumed
+ * to be called only in the context where we never have such pages.
+ */
+static inline bool HWPoisonHandlable(struct page *page)
+{
+ return PageLRU(page) || __PageMovable(page);
+}
+
/**
* __get_hwpoison_page() - Get refcount for memory error handling:
* @page: raw error page (hit by memory error)
@@ -959,8 +970,22 @@ static int page_action(struct page_state
static int __get_hwpoison_page(struct page *page)
{
struct page *head = compound_head(page);
+ int ret = 0;
+ bool hugetlb = false;
+
+ ret = get_hwpoison_huge_page(head, &hugetlb);
+ if (hugetlb)
+ return ret;
+
+ /*
+ * This check prevents from calling get_hwpoison_unless_zero()
+ * for any unsupported type of page in order to reduce the risk of
+ * unexpected races caused by taking a page refcount.
+ */
+ if (!HWPoisonHandlable(head))
+ return 0;
- if (!PageHuge(head) && PageTransHuge(head)) {
+ if (PageTransHuge(head)) {
/*
* Non anonymous thp exists only in allocation/free time. We
* can't handle such a case correctly, so let's give it up.
@@ -1017,7 +1042,7 @@ try_again:
ret = -EIO;
}
} else {
- if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
+ if (PageHuge(p) || HWPoisonHandlable(p)) {
ret = 1;
} else {
/*
_
The patch titled
Subject: mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
has been added to the -mm tree. Its filename is
mm-hwpoison-do-not-lock-page-again-when-me_huge_page-successfully-recovers.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hwpoison-do-not-lock-page-agai…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hwpoison-do-not-lock-page-agai…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Subject: mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
Currently me_huge_page() temporary unlocks page to perform some actions
then locks it again later. My testcase (which calls hard-offline on some
tail page in a hugetlb, then accesses the address of the hugetlb range)
showed that page allocation code detects this page lock on buddy page and
printed out "BUG: Bad page state" message.
check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
page, so this flag works as kind of filter, but this filtering doesn't
work in this case because the "bad page" is not the actual hwpoisoned
page. So stop locking page again. Actions to be taken depend on the page
type of the error, so page unlocking should be done in ->action()
callbacks. So let's make it assumed and change all existing callbacks
that way.
Link: https://lkml.kernel.org/r/20210609072029.74645-1-nao.horiguchi@gmail.com
Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in un=
recoverable memory error")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 44 ++++++++++++++++++++++++++++--------------
1 file changed, 30 insertions(+), 14 deletions(-)
--- a/mm/memory-failure.c~mm-hwpoison-do-not-lock-page-again-when-me_huge_page-successfully-recovers
+++ a/mm/memory-failure.c
@@ -658,6 +658,7 @@ static int truncate_error_page(struct pa
*/
static int me_kernel(struct page *p, unsigned long pfn)
{
+ unlock_page(p);
return MF_IGNORED;
}
@@ -667,6 +668,7 @@ static int me_kernel(struct page *p, uns
static int me_unknown(struct page *p, unsigned long pfn)
{
pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
+ unlock_page(p);
return MF_FAILED;
}
@@ -675,6 +677,7 @@ static int me_unknown(struct page *p, un
*/
static int me_pagecache_clean(struct page *p, unsigned long pfn)
{
+ int ret;
struct address_space *mapping;
delete_from_lru_cache(p);
@@ -683,8 +686,10 @@ static int me_pagecache_clean(struct pag
* For anonymous pages we're done the only reference left
* should be the one m_f() holds.
*/
- if (PageAnon(p))
- return MF_RECOVERED;
+ if (PageAnon(p)) {
+ ret = MF_RECOVERED;
+ goto out;
+ }
/*
* Now truncate the page in the page cache. This is really
@@ -698,7 +703,8 @@ static int me_pagecache_clean(struct pag
/*
* Page has been teared down in the meanwhile
*/
- return MF_FAILED;
+ ret = MF_FAILED;
+ goto out;
}
/*
@@ -706,7 +712,10 @@ static int me_pagecache_clean(struct pag
*
* Open: to take i_mutex or not for this? Right now we don't.
*/
- return truncate_error_page(p, pfn, mapping);
+ ret = truncate_error_page(p, pfn, mapping);
+out:
+ unlock_page(p);
+ return ret;
}
/*
@@ -782,24 +791,26 @@ static int me_pagecache_dirty(struct pag
*/
static int me_swapcache_dirty(struct page *p, unsigned long pfn)
{
+ int ret;
+
ClearPageDirty(p);
/* Trigger EIO in shmem: */
ClearPageUptodate(p);
- if (!delete_from_lru_cache(p))
- return MF_DELAYED;
- else
- return MF_FAILED;
+ ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED;
+ unlock_page(p);
+ return ret;
}
static int me_swapcache_clean(struct page *p, unsigned long pfn)
{
+ int ret;
+
delete_from_swap_cache(p);
- if (!delete_from_lru_cache(p))
- return MF_RECOVERED;
- else
- return MF_FAILED;
+ ret = delete_from_lru_cache(p) ? MF_FAILED : MF_RECOVERED;
+ unlock_page(p);
+ return ret;
}
/*
@@ -820,6 +831,7 @@ static int me_huge_page(struct page *p,
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, pfn, mapping);
+ unlock_page(hpage);
} else {
res = MF_FAILED;
unlock_page(hpage);
@@ -834,7 +846,6 @@ static int me_huge_page(struct page *p,
page_ref_inc(p);
res = MF_RECOVERED;
}
- lock_page(hpage);
}
return res;
@@ -866,6 +877,8 @@ static struct page_state {
unsigned long mask;
unsigned long res;
enum mf_action_page_type type;
+
+ /* Callback ->action() has to unlock the relevant page inside it. */
int (*action)(struct page *p, unsigned long pfn);
} error_states[] = {
{ reserved, reserved, MF_MSG_KERNEL, me_kernel },
@@ -929,6 +942,7 @@ static int page_action(struct page_state
int result;
int count;
+ /* page p should be unlocked after returning from ps->action(). */
result = ps->action(p, pfn);
count = page_count(p) - 1;
@@ -1313,7 +1327,7 @@ static int memory_failure_hugetlb(unsign
goto out;
}
- res = identify_page_state(pfn, p, page_flags);
+ return identify_page_state(pfn, p, page_flags);
out:
unlock_page(head);
return res;
@@ -1596,6 +1610,8 @@ try_again:
identify_page_state:
res = identify_page_state(pfn, p, page_flags);
+ mutex_unlock(&mf_mutex);
+ return res;
unlock_page:
unlock_page(p);
unlock_mutex:
_
Patches currently in -mm which might be from naoya.horiguchi(a)nec.com are
mmhwpoison-fix-race-with-hugetlb-page-allocation.patch
mm-hwpoison-do-not-lock-page-again-when-me_huge_page-successfully-recovers.patch
mmhwpoison-send-sigbus-with-error-virutal-address.patch
mmhwpoison-send-sigbus-with-error-virutal-address-fix.patch
mmhwpoison-make-get_hwpoison_page-call-get_any_page.patch
The rproc_char_device_remove() call currently unmaps the cdev
region instead of simply deleting the cdev that was added as a
part of the rproc_char_device_add() call. This change fixes that
behaviour, and also fixes the order in which device_del() and
cdev_del() need to be called.
Signed-off-by: Siddharth Gupta <sidgup(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_cdev.c | 2 +-
drivers/remoteproc/remoteproc_core.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_cdev.c b/drivers/remoteproc/remoteproc_cdev.c
index 0b8a84c..4ad98b0 100644
--- a/drivers/remoteproc/remoteproc_cdev.c
+++ b/drivers/remoteproc/remoteproc_cdev.c
@@ -124,7 +124,7 @@ int rproc_char_device_add(struct rproc *rproc)
void rproc_char_device_remove(struct rproc *rproc)
{
- __unregister_chrdev(MAJOR(rproc->dev.devt), rproc->index, 1, "remoteproc");
+ cdev_del(&rproc->cdev);
}
void __init rproc_init_cdev(void)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index b65fce3..b874280 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -2619,7 +2619,6 @@ int rproc_del(struct rproc *rproc)
mutex_unlock(&rproc->lock);
rproc_delete_debug_dir(rproc);
- rproc_char_device_remove(rproc);
/* the rproc is downref'ed as soon as it's removed from the klist */
mutex_lock(&rproc_list_mutex);
@@ -2630,6 +2629,7 @@ int rproc_del(struct rproc *rproc)
synchronize_rcu();
device_del(&rproc->dev);
+ rproc_char_device_remove(rproc);
return 0;
}
--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
We can validate whether the remoteproc is correctly setup before
making the cdev_add and device_add calls. This saves us the
trouble of cleaning up later on.
Signed-off-by: Siddharth Gupta <sidgup(a)codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 9ad8c5f..b65fce3 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -2333,16 +2333,16 @@ int rproc_add(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret;
- /* add char device for this remoteproc */
- ret = rproc_char_device_add(rproc);
+ ret = rproc_validate(rproc);
if (ret < 0)
return ret;
- ret = device_add(dev);
+ /* add char device for this remoteproc */
+ ret = rproc_char_device_add(rproc);
if (ret < 0)
return ret;
- ret = rproc_validate(rproc);
+ ret = device_add(dev);
if (ret < 0)
return ret;
--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
When cdev_add is called after device_add has been called there is no
way for the userspace to know about the addition of a cdev as cdev_add
itself doesn't trigger a uevent notification, or for the kernel to
know about the change to devt. This results in two problems:
- mknod is never called for the cdev and hence no cdev appears on
devtmpfs.
- sysfs links to the new cdev are not established.
The cdev needs to be added and devt assigned before device_add() is
called in order for the relevant sysfs and devtmpfs entries to be
created and the uevent to be properly populated.
Signed-off-by: Siddharth Gupta <sidgup(a)codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_core.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 6348aaa..9ad8c5f 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -2333,6 +2333,11 @@ int rproc_add(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret;
+ /* add char device for this remoteproc */
+ ret = rproc_char_device_add(rproc);
+ if (ret < 0)
+ return ret;
+
ret = device_add(dev);
if (ret < 0)
return ret;
@@ -2346,11 +2351,6 @@ int rproc_add(struct rproc *rproc)
/* create debugfs entries */
rproc_create_debug_dir(rproc);
- /* add char device for this remoteproc */
- ret = rproc_char_device_add(rproc);
- if (ret < 0)
- return ret;
-
/* if rproc is marked always-on, request it to boot */
if (rproc->auto_boot) {
ret = rproc_trigger_auto_boot(rproc);
--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Hello all,
with commit "mount-util: shortcut things after generating top-level bind
mount" c2c331056a7c331a5478124b3cd6a34c9f539839
(5c5753b9ea5cc012586ae90d357d460dec4301a4 in master) systemd 248.2 and
later won't boot 4.19 kernel series. I tested 4.19.194 so far.
Lennart's guess is that this is the mount table brokeness on old
kernels, that newer libmount worked around. i.e. /proc/self/mountinfo on
old kernels showed partially old data and partially new data, and
confused the heck out of everyone. "proc on 20" with mount options
"(rw,25)" doesn't look right at all.
If we look at 4.14 and 4.4 kernel series, those boot. 4.9 I've still to
test, as that one didn't boot either for me.
So an idea what is missing in 4.19 kernel series? Other kernels work.
Systemd issue is this one: https://github.com/systemd/systemd/issues/19926
--
Best, Philip
Hi,
The commit 591a22c14d3f ("proc: Track /proc/$pid/attr/ opener mm_struct")
that we got in v5.13-rc6 broke our regression to pieces. The NIC interfaces
fail to start when using NetworkManager.
There is nothing in dmesg except error that NetworkManager failed to start.
Our setups are:
* VMs with virtio-net NICs
* Fedora 29
The revert fixes the issue and VMs boot with network working.
Thanks
From: Dai Ngo <dai.ngo(a)oracle.com>
[ Upstream commit f8849e206ef52b584cd9227255f4724f0cc900bb ]
Currently if __nfs4_proc_set_acl fails with NFS4ERR_BADOWNER it
re-enables the idmapper by clearing NFS_CAP_UIDGID_NOMAP before
retrying again. The NFS_CAP_UIDGID_NOMAP remains cleared even if
the retry fails. This causes problem for subsequent setattr
requests for v4 server that does not have idmapping configured.
This patch modifies nfs4_proc_set_acl to detect NFS4ERR_BADOWNER
and NFS4ERR_BADNAME and skips the retry, since the kernel isn't
involved in encoding the ACEs, and return -EINVAL.
Steps to reproduce the problem:
# mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt
# touch /tmp/mnt/file1
# chown 99 /tmp/mnt/file1
# nfs4_setfacl -a A::unknown.user@xyz.com:wrtncy /tmp/mnt/file1
Failed setxattr operation: Invalid argument
# chown 99 /tmp/mnt/file1
chown: changing ownership of ‘/tmp/mnt/file1’: Invalid argument
# umount /tmp/mnt
# mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt
# chown 99 /tmp/mnt/file1
#
v2: detect NFS4ERR_BADOWNER and NFS4ERR_BADNAME and skip retry
in nfs4_proc_set_acl.
Signed-off-by: Dai Ngo <dai.ngo(a)oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/nfs/nfs4proc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 92ca753723b5..e10bada12361 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4887,6 +4887,14 @@ static int nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t buflen
do {
err = __nfs4_proc_set_acl(inode, buf, buflen);
trace_nfs4_set_acl(inode, err);
+ if (err == -NFS4ERR_BADOWNER || err == -NFS4ERR_BADNAME) {
+ /*
+ * no need to retry since the kernel
+ * isn't involved in encoding the ACEs.
+ */
+ err = -EINVAL;
+ break;
+ }
err = nfs4_handle_exception(NFS_SERVER(inode), err,
&exception);
} while (exception.retry);
--
2.30.2
From: Dai Ngo <dai.ngo(a)oracle.com>
[ Upstream commit f8849e206ef52b584cd9227255f4724f0cc900bb ]
Currently if __nfs4_proc_set_acl fails with NFS4ERR_BADOWNER it
re-enables the idmapper by clearing NFS_CAP_UIDGID_NOMAP before
retrying again. The NFS_CAP_UIDGID_NOMAP remains cleared even if
the retry fails. This causes problem for subsequent setattr
requests for v4 server that does not have idmapping configured.
This patch modifies nfs4_proc_set_acl to detect NFS4ERR_BADOWNER
and NFS4ERR_BADNAME and skips the retry, since the kernel isn't
involved in encoding the ACEs, and return -EINVAL.
Steps to reproduce the problem:
# mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt
# touch /tmp/mnt/file1
# chown 99 /tmp/mnt/file1
# nfs4_setfacl -a A::unknown.user@xyz.com:wrtncy /tmp/mnt/file1
Failed setxattr operation: Invalid argument
# chown 99 /tmp/mnt/file1
chown: changing ownership of ‘/tmp/mnt/file1’: Invalid argument
# umount /tmp/mnt
# mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt
# chown 99 /tmp/mnt/file1
#
v2: detect NFS4ERR_BADOWNER and NFS4ERR_BADNAME and skip retry
in nfs4_proc_set_acl.
Signed-off-by: Dai Ngo <dai.ngo(a)oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/nfs/nfs4proc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 94130588ebf5..2ea772f596e3 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5183,6 +5183,14 @@ static int nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t buflen
do {
err = __nfs4_proc_set_acl(inode, buf, buflen);
trace_nfs4_set_acl(inode, err);
+ if (err == -NFS4ERR_BADOWNER || err == -NFS4ERR_BADNAME) {
+ /*
+ * no need to retry since the kernel
+ * isn't involved in encoding the ACEs.
+ */
+ err = -EINVAL;
+ break;
+ }
err = nfs4_handle_exception(NFS_SERVER(inode), err,
&exception);
} while (exception.retry);
--
2.30.2
From: Dai Ngo <dai.ngo(a)oracle.com>
[ Upstream commit f8849e206ef52b584cd9227255f4724f0cc900bb ]
Currently if __nfs4_proc_set_acl fails with NFS4ERR_BADOWNER it
re-enables the idmapper by clearing NFS_CAP_UIDGID_NOMAP before
retrying again. The NFS_CAP_UIDGID_NOMAP remains cleared even if
the retry fails. This causes problem for subsequent setattr
requests for v4 server that does not have idmapping configured.
This patch modifies nfs4_proc_set_acl to detect NFS4ERR_BADOWNER
and NFS4ERR_BADNAME and skips the retry, since the kernel isn't
involved in encoding the ACEs, and return -EINVAL.
Steps to reproduce the problem:
# mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt
# touch /tmp/mnt/file1
# chown 99 /tmp/mnt/file1
# nfs4_setfacl -a A::unknown.user@xyz.com:wrtncy /tmp/mnt/file1
Failed setxattr operation: Invalid argument
# chown 99 /tmp/mnt/file1
chown: changing ownership of ‘/tmp/mnt/file1’: Invalid argument
# umount /tmp/mnt
# mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt
# chown 99 /tmp/mnt/file1
#
v2: detect NFS4ERR_BADOWNER and NFS4ERR_BADNAME and skip retry
in nfs4_proc_set_acl.
Signed-off-by: Dai Ngo <dai.ngo(a)oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/nfs/nfs4proc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e053fd7f83d8..ae19ead908d5 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5294,6 +5294,14 @@ static int nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t buflen
do {
err = __nfs4_proc_set_acl(inode, buf, buflen);
trace_nfs4_set_acl(inode, err);
+ if (err == -NFS4ERR_BADOWNER || err == -NFS4ERR_BADNAME) {
+ /*
+ * no need to retry since the kernel
+ * isn't involved in encoding the ACEs.
+ */
+ err = -EINVAL;
+ break;
+ }
err = nfs4_handle_exception(NFS_SERVER(inode), err,
&exception);
} while (exception.retry);
--
2.30.2
The ROM load sometimes seems to return an unknown status
(RENESAS_ROM_STATUS_NO_RESULT) instead of success / fail.
If the ROM load indeed failed this leads to failures when trying to
communicate with the controller later on.
Attempt to load firmware using RAM load in those cases.
Cc: stable(a)vger.kernel.org
Cc: Mathias Nyman <mathias.nyman(a)intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Vinod Koul <vkoul(a)kernel.org>
Fixes: 2478be82de44 ("usb: renesas-xhci: Add ROM loader for uPD720201")
Tested-by: Vinod Koul <vkoul(a)kernel.org>
Reviewed-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Moritz Fischer <mdf(a)kernel.org>
---
Changes from v1:
- Added Vinod's Tested-by and Reviewed-by
- Reworded commit message with more detail
- Fixed up comment formatting
---
drivers/usb/host/xhci-pci-renesas.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/host/xhci-pci-renesas.c b/drivers/usb/host/xhci-pci-renesas.c
index f97ac9f52bf4..431213cdf9e0 100644
--- a/drivers/usb/host/xhci-pci-renesas.c
+++ b/drivers/usb/host/xhci-pci-renesas.c
@@ -207,7 +207,8 @@ static int renesas_check_rom_state(struct pci_dev *pdev)
return 0;
case RENESAS_ROM_STATUS_NO_RESULT: /* No result yet */
- return 0;
+ dev_dbg(&pdev->dev, "Unknown ROM status ...\n");
+ break;
case RENESAS_ROM_STATUS_ERROR: /* Error State */
default: /* All other states are marked as "Reserved states" */
@@ -224,13 +225,12 @@ static int renesas_fw_check_running(struct pci_dev *pdev)
u8 fw_state;
int err;
- /* Check if device has ROM and loaded, if so skip everything */
- err = renesas_check_rom(pdev);
- if (err) { /* we have rom */
- err = renesas_check_rom_state(pdev);
- if (!err)
- return err;
- }
+ /*
+ * Only if device has ROM and loaded FW we can skip loading and
+ * return success. Otherwise (even unknown state), attempt to load FW.
+ */
+ if (renesas_check_rom(pdev) && !renesas_check_rom_state(pdev))
+ return 0;
/*
* Test if the device is actually needing the firmware. As most
--
2.31.1
This is the start of the stable review cycle for the 4.14.237 release.
There are 49 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 16 Jun 2021 10:26:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.237-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.237-rc1
Liangyan <liangyan.peng(a)linux.alibaba.com>
tracing: Correct the length check which causes memory corruption
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ftrace: Do not blindly read the ip address in ftrace_bug()
Ming Lei <ming.lei(a)redhat.com>
scsi: core: Only put parent device if host state differs from SHOST_CREATED
Ming Lei <ming.lei(a)redhat.com>
scsi: core: Put .shost_dev in failure path if host state changes to RUNNING
Ming Lei <ming.lei(a)redhat.com>
scsi: core: Fix error handling of scsi_host_alloc()
Dai Ngo <dai.ngo(a)oracle.com>
NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
Anna Schumaker <Anna.Schumaker(a)Netapp.com>
NFS: Fix use-after-free in nfs4_init_client()
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: fix previous commit for 32-bit builds
Leo Yan <leo.yan(a)linaro.org>
perf session: Correct buffer copying when peeking events
Dan Carpenter <dan.carpenter(a)oracle.com>
NFS: Fix a potential NULL dereference in nfs_get_client()
Marco Elver <elver(a)google.com>
perf: Fix data race between pin_count increment/decrement
Dmitry Osipenko <digetx(a)gmail.com>
regulator: max77620: Use device_set_of_node_from_dev()
Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
regulator: core: resolve supply for boot-on/always-on regulators
Maciej Żenczykowski <maze(a)google.com>
usb: fix various gadget panics on 10gbps cabling
Maciej Żenczykowski <maze(a)google.com>
usb: fix various gadgets null ptr deref on 10gbps cabling.
Linyu Yuan <linyyuan(a)codeaurora.com>
usb: gadget: eem: fix wrong eem header operation
Johan Hovold <johan(a)kernel.org>
USB: serial: quatech2: fix control-request directions
Alexandre GRIVEAUX <agriveaux(a)deutnet.info>
USB: serial: omninet: add device id for Zyxel Omni 56K Plus
George McCollister <george.mccollister(a)gmail.com>
USB: serial: ftdi_sio: add NovaTech OrionMX product ID
Wesley Cheng <wcheng(a)codeaurora.org>
usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind
Mayank Rana <mrana(a)codeaurora.org>
usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path
Marian-Cristian Rotariu <marian.c.rotariu(a)gmail.com>
usb: dwc3: ep0: fix NULL pointer exception
Maciej Żenczykowski <maze(a)google.com>
USB: f_ncm: ncm_bitrate (speed) is unsigned
Alexander Kuznetsov <wwfq(a)yandex-team.ru>
cgroup1: don't allow '\n' in renaming
Ritesh Harjani <riteshh(a)linux.ibm.com>
btrfs: return value from btrfs_mark_extent_written() in case of error
Wenli Looi <wlooi(a)ucalgary.ca>
staging: rtl8723bs: Fix uninitialized variables
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: avoid speculation-based attacks from out-of-range memslot accesses
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
drm: Lock pointer access in drm_master_release()
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
drm: Fix use-after-free read in drm_getunique()
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
i2c: mpc: implement erratum A-004447 workaround
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
i2c: mpc: Make use of i2c_recover_bus()
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
bnx2x: Fix missing error code in bnx2x_iov_init_one()
Tiezhu Yang <yangtiezhu(a)loongson.cn>
MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
Saubhik Mukherjee <saubhik.mukherjee(a)gmail.com>
net: appletalk: cops: Fix data race in cops_probe1
Zong Li <zong.li(a)sifive.com>
net: macb: ensure the device is available before accessing GEMGXL control registers
Dmitry Bogdanov <d.bogdanov(a)yadro.com>
scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
Matt Wang <wwentao(a)vmware.com>
scsi: vmw_pvscsi: Set correct residual data length
Zheyu Ma <zheyuma97(a)gmail.com>
net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
Sergey Senozhatsky <senozhatsky(a)chromium.org>
wq: handle VM suspension in stall detection
Shakeel Butt <shakeelb(a)google.com>
cgroup: disable controllers at parse time
Dan Carpenter <dan.carpenter(a)oracle.com>
net: mdiobus: get rid of a BUG_ON()
Johannes Berg <johannes.berg(a)intel.com>
netlink: disable IRQs for netlink_lock_table()
Johannes Berg <johannes.berg(a)intel.com>
bonding: init notify_work earlier to avoid uninitialized use
Zheyu Ma <zheyuma97(a)gmail.com>
isdn: mISDN: netjet: Fix crash in nj_probe:
Zou Wei <zou_wei(a)huawei.com>
ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
Jeimon <jjjinmeng.zhou(a)gmail.com>
net/nfc/rawsock.c: fix a permission check bug
Kees Cook <keescook(a)chromium.org>
proc: Track /proc/$pid/attr/ opener mm_struct
-------------
Diffstat:
Makefile | 4 +-
arch/mips/lib/mips-atomic.c | 12 +--
arch/powerpc/boot/dts/fsl/p1010si-post.dtsi | 8 ++
arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 16 ++++
drivers/gpu/drm/drm_auth.c | 3 +-
drivers/gpu/drm/drm_ioctl.c | 9 ++-
drivers/i2c/busses/i2c-mpc.c | 95 ++++++++++++++++++++++-
drivers/isdn/hardware/mISDN/netjet.c | 1 -
drivers/net/appletalk/cops.c | 4 +-
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 4 +-
drivers/net/ethernet/cadence/macb_main.c | 3 +
drivers/net/ethernet/qlogic/qla3xxx.c | 2 +-
drivers/net/phy/mdio_bus.c | 3 +-
drivers/regulator/core.c | 6 ++
drivers/regulator/max77620-regulator.c | 7 ++
drivers/scsi/hosts.c | 33 ++++----
drivers/scsi/qla2xxx/qla_target.c | 2 +
drivers/scsi/vmw_pvscsi.c | 8 +-
drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c | 2 +-
drivers/usb/dwc3/ep0.c | 3 +
drivers/usb/gadget/config.c | 8 ++
drivers/usb/gadget/function/f_ecm.c | 2 +-
drivers/usb/gadget/function/f_eem.c | 6 +-
drivers/usb/gadget/function/f_fs.c | 3 +
drivers/usb/gadget/function/f_hid.c | 3 +-
drivers/usb/gadget/function/f_loopback.c | 2 +-
drivers/usb/gadget/function/f_ncm.c | 2 +-
drivers/usb/gadget/function/f_printer.c | 3 +-
drivers/usb/gadget/function/f_rndis.c | 2 +-
drivers/usb/gadget/function/f_serial.c | 2 +-
drivers/usb/gadget/function/f_sourcesink.c | 3 +-
drivers/usb/gadget/function/f_subset.c | 2 +-
drivers/usb/gadget/function/f_tcm.c | 3 +-
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 1 +
drivers/usb/serial/omninet.c | 2 +
drivers/usb/serial/quatech2.c | 6 +-
drivers/usb/typec/ucsi/ucsi.c | 1 +
fs/btrfs/file.c | 4 +-
fs/nfs/client.c | 2 +-
fs/nfs/nfs4client.c | 2 +-
fs/nfs/nfs4proc.c | 8 ++
fs/proc/base.c | 9 ++-
include/linux/kvm_host.h | 10 ++-
kernel/cgroup/cgroup-v1.c | 4 +
kernel/cgroup/cgroup.c | 13 ++--
kernel/events/core.c | 2 +
kernel/trace/ftrace.c | 8 +-
kernel/trace/trace.c | 2 +-
kernel/workqueue.c | 12 ++-
net/netlink/af_netlink.c | 6 +-
net/nfc/rawsock.c | 2 +-
sound/soc/codecs/sti-sas.c | 1 +
tools/perf/util/session.c | 1 +
55 files changed, 291 insertions(+), 74 deletions(-)
This is the start of the stable review cycle for the 4.9.273 release.
There are 41 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 17 Jun 2021 06:06:45 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.273-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.273-rc2
Liangyan <liangyan.peng(a)linux.alibaba.com>
tracing: Correct the length check which causes memory corruption
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ftrace: Do not blindly read the ip address in ftrace_bug()
Ming Lei <ming.lei(a)redhat.com>
scsi: core: Only put parent device if host state differs from SHOST_CREATED
Ming Lei <ming.lei(a)redhat.com>
scsi: core: Fix error handling of scsi_host_alloc()
Dai Ngo <dai.ngo(a)oracle.com>
NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: fix previous commit for 32-bit builds
Leo Yan <leo.yan(a)linaro.org>
perf session: Correct buffer copying when peeking events
Dan Carpenter <dan.carpenter(a)oracle.com>
NFS: Fix a potential NULL dereference in nfs_get_client()
Marco Elver <elver(a)google.com>
perf: Fix data race between pin_count increment/decrement
Maciej Żenczykowski <maze(a)google.com>
usb: fix various gadget panics on 10gbps cabling
Maciej Żenczykowski <maze(a)google.com>
usb: fix various gadgets null ptr deref on 10gbps cabling.
Linyu Yuan <linyyuan(a)codeaurora.com>
usb: gadget: eem: fix wrong eem header operation
Johan Hovold <johan(a)kernel.org>
USB: serial: quatech2: fix control-request directions
Alexandre GRIVEAUX <agriveaux(a)deutnet.info>
USB: serial: omninet: add device id for Zyxel Omni 56K Plus
George McCollister <george.mccollister(a)gmail.com>
USB: serial: ftdi_sio: add NovaTech OrionMX product ID
Marian-Cristian Rotariu <marian.c.rotariu(a)gmail.com>
usb: dwc3: ep0: fix NULL pointer exception
Maciej Żenczykowski <maze(a)google.com>
USB: f_ncm: ncm_bitrate (speed) is unsigned
Alexander Kuznetsov <wwfq(a)yandex-team.ru>
cgroup1: don't allow '\n' in renaming
Ritesh Harjani <riteshh(a)linux.ibm.com>
btrfs: return value from btrfs_mark_extent_written() in case of error
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: avoid speculation-based attacks from out-of-range memslot accesses
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
drm: Lock pointer access in drm_master_release()
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
i2c: mpc: implement erratum A-004447 workaround
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
i2c: mpc: Make use of i2c_recover_bus()
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
bnx2x: Fix missing error code in bnx2x_iov_init_one()
Tiezhu Yang <yangtiezhu(a)loongson.cn>
MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
Saubhik Mukherjee <saubhik.mukherjee(a)gmail.com>
net: appletalk: cops: Fix data race in cops_probe1
Zong Li <zong.li(a)sifive.com>
net: macb: ensure the device is available before accessing GEMGXL control registers
Dmitry Bogdanov <d.bogdanov(a)yadro.com>
scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
Matt Wang <wwentao(a)vmware.com>
scsi: vmw_pvscsi: Set correct residual data length
Zheyu Ma <zheyuma97(a)gmail.com>
net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
Sergey Senozhatsky <senozhatsky(a)chromium.org>
wq: handle VM suspension in stall detection
Shakeel Butt <shakeelb(a)google.com>
cgroup: disable controllers at parse time
Dan Carpenter <dan.carpenter(a)oracle.com>
net: mdiobus: get rid of a BUG_ON()
Johannes Berg <johannes.berg(a)intel.com>
netlink: disable IRQs for netlink_lock_table()
Johannes Berg <johannes.berg(a)intel.com>
bonding: init notify_work earlier to avoid uninitialized use
Zheyu Ma <zheyuma97(a)gmail.com>
isdn: mISDN: netjet: Fix crash in nj_probe:
Zou Wei <zou_wei(a)huawei.com>
ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
Jeimon <jjjinmeng.zhou(a)gmail.com>
net/nfc/rawsock.c: fix a permission check bug
Kees Cook <keescook(a)chromium.org>
proc: Track /proc/$pid/attr/ opener mm_struct
-------------
Diffstat:
Makefile | 4 +-
arch/mips/lib/mips-atomic.c | 12 +--
arch/powerpc/boot/dts/fsl/p1010si-post.dtsi | 8 ++
arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 16 ++++
drivers/gpu/drm/drm_auth.c | 3 +-
drivers/i2c/busses/i2c-mpc.c | 95 ++++++++++++++++++++++-
drivers/isdn/hardware/mISDN/netjet.c | 1 -
drivers/net/appletalk/cops.c | 4 +-
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 4 +-
drivers/net/ethernet/cadence/macb.c | 3 +
drivers/net/ethernet/qlogic/qla3xxx.c | 2 +-
drivers/net/phy/mdio_bus.c | 3 +-
drivers/scsi/hosts.c | 25 +++---
drivers/scsi/qla2xxx/qla_target.c | 2 +
drivers/scsi/vmw_pvscsi.c | 8 +-
drivers/usb/dwc3/ep0.c | 3 +
drivers/usb/gadget/config.c | 8 ++
drivers/usb/gadget/function/f_ecm.c | 2 +-
drivers/usb/gadget/function/f_eem.c | 6 +-
drivers/usb/gadget/function/f_loopback.c | 2 +-
drivers/usb/gadget/function/f_ncm.c | 2 +-
drivers/usb/gadget/function/f_printer.c | 3 +-
drivers/usb/gadget/function/f_rndis.c | 2 +-
drivers/usb/gadget/function/f_serial.c | 2 +-
drivers/usb/gadget/function/f_sourcesink.c | 3 +-
drivers/usb/gadget/function/f_subset.c | 2 +-
drivers/usb/gadget/function/f_tcm.c | 3 +-
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 1 +
drivers/usb/serial/omninet.c | 2 +
drivers/usb/serial/quatech2.c | 6 +-
fs/btrfs/file.c | 4 +-
fs/nfs/client.c | 2 +-
fs/nfs/nfs4proc.c | 8 ++
fs/proc/base.c | 9 ++-
include/linux/kvm_host.h | 11 ++-
kernel/cgroup.c | 17 ++--
kernel/events/core.c | 2 +
kernel/trace/ftrace.c | 8 +-
kernel/trace/trace.c | 2 +-
kernel/workqueue.c | 12 ++-
net/netlink/af_netlink.c | 6 +-
net/nfc/rawsock.c | 2 +-
sound/soc/codecs/sti-sas.c | 1 +
tools/perf/util/session.c | 1 +
46 files changed, 260 insertions(+), 65 deletions(-)
This is the start of the stable review cycle for the 4.4.273 release.
There are 34 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 16 Jun 2021 10:26:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.273-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.273-rc1
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ftrace: Do not blindly read the ip address in ftrace_bug()
Ming Lei <ming.lei(a)redhat.com>
scsi: core: Only put parent device if host state differs from SHOST_CREATED
Dai Ngo <dai.ngo(a)oracle.com>
NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: fix previous commit for 32-bit builds
Leo Yan <leo.yan(a)linaro.org>
perf session: Correct buffer copying when peeking events
Dan Carpenter <dan.carpenter(a)oracle.com>
NFS: Fix a potential NULL dereference in nfs_get_client()
Marco Elver <elver(a)google.com>
perf: Fix data race between pin_count increment/decrement
Linyu Yuan <linyyuan(a)codeaurora.com>
usb: gadget: eem: fix wrong eem header operation
Johan Hovold <johan(a)kernel.org>
USB: serial: quatech2: fix control-request directions
Alexandre GRIVEAUX <agriveaux(a)deutnet.info>
USB: serial: omninet: add device id for Zyxel Omni 56K Plus
George McCollister <george.mccollister(a)gmail.com>
USB: serial: ftdi_sio: add NovaTech OrionMX product ID
Marian-Cristian Rotariu <marian.c.rotariu(a)gmail.com>
usb: dwc3: ep0: fix NULL pointer exception
Maciej Żenczykowski <maze(a)google.com>
USB: f_ncm: ncm_bitrate (speed) is unsigned
Alexander Kuznetsov <wwfq(a)yandex-team.ru>
cgroup1: don't allow '\n' in renaming
Ritesh Harjani <riteshh(a)linux.ibm.com>
btrfs: return value from btrfs_mark_extent_written() in case of error
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: avoid speculation-based attacks from out-of-range memslot accesses
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
i2c: mpc: implement erratum A-004447 workaround
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
i2c: mpc: Make use of i2c_recover_bus()
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
bnx2x: Fix missing error code in bnx2x_iov_init_one()
Tiezhu Yang <yangtiezhu(a)loongson.cn>
MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
Saubhik Mukherjee <saubhik.mukherjee(a)gmail.com>
net: appletalk: cops: Fix data race in cops_probe1
Zong Li <zong.li(a)sifive.com>
net: macb: ensure the device is available before accessing GEMGXL control registers
Dmitry Bogdanov <d.bogdanov(a)yadro.com>
scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
Matt Wang <wwentao(a)vmware.com>
scsi: vmw_pvscsi: Set correct residual data length
Zheyu Ma <zheyuma97(a)gmail.com>
net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
Dan Carpenter <dan.carpenter(a)oracle.com>
net: mdiobus: get rid of a BUG_ON()
Johannes Berg <johannes.berg(a)intel.com>
netlink: disable IRQs for netlink_lock_table()
Johannes Berg <johannes.berg(a)intel.com>
bonding: init notify_work earlier to avoid uninitialized use
Zheyu Ma <zheyuma97(a)gmail.com>
isdn: mISDN: netjet: Fix crash in nj_probe:
Zou Wei <zou_wei(a)huawei.com>
ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
Jeimon <jjjinmeng.zhou(a)gmail.com>
net/nfc/rawsock.c: fix a permission check bug
Kees Cook <keescook(a)chromium.org>
proc: Track /proc/$pid/attr/ opener mm_struct
-------------
Diffstat:
Makefile | 4 +-
arch/mips/lib/mips-atomic.c | 12 +--
arch/powerpc/boot/dts/fsl/p1010si-post.dtsi | 8 ++
arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 16 ++++
drivers/i2c/busses/i2c-mpc.c | 95 ++++++++++++++++++++++-
drivers/isdn/hardware/mISDN/netjet.c | 1 -
drivers/net/appletalk/cops.c | 4 +-
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 4 +-
drivers/net/ethernet/cadence/macb.c | 3 +
drivers/net/ethernet/qlogic/qla3xxx.c | 2 +-
drivers/net/phy/mdio_bus.c | 3 +-
drivers/scsi/hosts.c | 2 +-
drivers/scsi/qla2xxx/qla_target.c | 2 +
drivers/scsi/vmw_pvscsi.c | 8 +-
drivers/usb/dwc3/ep0.c | 3 +
drivers/usb/gadget/function/f_eem.c | 4 +-
drivers/usb/gadget/function/f_ncm.c | 2 +-
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 1 +
drivers/usb/serial/omninet.c | 2 +
drivers/usb/serial/quatech2.c | 6 +-
fs/btrfs/file.c | 4 +-
fs/nfs/client.c | 2 +-
fs/nfs/nfs4proc.c | 8 ++
fs/proc/base.c | 9 ++-
include/linux/kvm_host.h | 11 ++-
kernel/cgroup.c | 4 +
kernel/events/core.c | 2 +
kernel/trace/ftrace.c | 8 +-
net/netlink/af_netlink.c | 6 +-
net/nfc/rawsock.c | 2 +-
sound/soc/codecs/sti-sas.c | 1 +
tools/perf/util/session.c | 1 +
34 files changed, 209 insertions(+), 34 deletions(-)
In AP mode WPA2-PSK connections were not established.
The reason was that the AP was sending the first message
of the 4 way handshake encrypted, even though no pairwise
key had (correctly) yet been set.
Encryption was enabled if the "security_enable" driver flag
was set and encryption was not explicitly disabled by
IEEE80211_TX_INTFL_DONT_ENCRYPT.
However security_enable was set when *any* key, including
the AP GTK key, had been set which was causing unwanted
encryption even if no key was avaialble for the unicast
packet to be sent.
Fix this by adding a check that we have a key and drop
the old security_enable driver flag which is insufficient
and redundant.
The Redpine downstream out of tree driver does it this way too.
Regarding the Fixes tag the actual code being modified was
introduced earlier, with the original driver submission, in
dad0d04fa7ba ("rsi: Add RS9113 wireless driver"), however
at that time AP mode was not yet supported so there was
no bug at that point.
So I have tagged the introduction of AP support instead
which was part of the patch set "rsi: support for AP mode" [1]
It is not clear whether AP WPA has ever worked, I can see nothing
on the kernel side that broke it afterwards yet the AP support
patch series says "Tests are performed to confirm aggregation,
connections in WEP and WPA/WPA2 security."
One possibility is that the initial tests were done with a modified
userspace (hostapd).
[1] https://www.spinics.net/lists/linux-wireless/msg165302.html
Signed-off-by: Martin Fuzzey <martin.fuzzey(a)flowbird.group>
Fixes: 38ef62353acb ("rsi: security enhancements for AP mode")
CC: stable(a)vger.kernel.org
---
V2:
Remove security_enable driver flag
Remove unnecessary parantheses
Improve $SUBJECT
drivers/net/wireless/rsi/rsi_91x_hal.c | 2 +-
drivers/net/wireless/rsi/rsi_91x_mac80211.c | 3 ---
drivers/net/wireless/rsi/rsi_91x_mgmt.c | 3 +--
drivers/net/wireless/rsi/rsi_main.h | 1 -
4 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/rsi/rsi_91x_hal.c b/drivers/net/wireless/rsi/rsi_91x_hal.c
index ce98921..a51cb0a 100644
--- a/drivers/net/wireless/rsi/rsi_91x_hal.c
+++ b/drivers/net/wireless/rsi/rsi_91x_hal.c
@@ -203,7 +203,7 @@ int rsi_prepare_data_desc(struct rsi_common *common, struct sk_buff *skb)
wh->frame_control |= cpu_to_le16(RSI_SET_PS_ENABLE);
if ((!(info->flags & IEEE80211_TX_INTFL_DONT_ENCRYPT)) &&
- (common->secinfo.security_enable)) {
+ info->control.hw_key) {
if (rsi_is_cipher_wep(common))
ieee80211_size += 4;
else
diff --git a/drivers/net/wireless/rsi/rsi_91x_mac80211.c b/drivers/net/wireless/rsi/rsi_91x_mac80211.c
index 1602530..57c9e35 100644
--- a/drivers/net/wireless/rsi/rsi_91x_mac80211.c
+++ b/drivers/net/wireless/rsi/rsi_91x_mac80211.c
@@ -1028,7 +1028,6 @@ static int rsi_mac80211_set_key(struct ieee80211_hw *hw,
mutex_lock(&common->mutex);
switch (cmd) {
case SET_KEY:
- secinfo->security_enable = true;
status = rsi_hal_key_config(hw, vif, key, sta);
if (status) {
mutex_unlock(&common->mutex);
@@ -1047,8 +1046,6 @@ static int rsi_mac80211_set_key(struct ieee80211_hw *hw,
break;
case DISABLE_KEY:
- if (vif->type == NL80211_IFTYPE_STATION)
- secinfo->security_enable = false;
rsi_dbg(ERR_ZONE, "%s: RSI del key\n", __func__);
memset(key, 0, sizeof(struct ieee80211_key_conf));
status = rsi_hal_key_config(hw, vif, key, sta);
diff --git a/drivers/net/wireless/rsi/rsi_91x_mgmt.c b/drivers/net/wireless/rsi/rsi_91x_mgmt.c
index 33c76d3..b6d050a 100644
--- a/drivers/net/wireless/rsi/rsi_91x_mgmt.c
+++ b/drivers/net/wireless/rsi/rsi_91x_mgmt.c
@@ -1803,8 +1803,7 @@ int rsi_send_wowlan_request(struct rsi_common *common, u16 flags,
RSI_WIFI_MGMT_Q);
cmd_frame->desc.desc_dword0.frame_type = WOWLAN_CONFIG_PARAMS;
cmd_frame->host_sleep_status = sleep_status;
- if (common->secinfo.security_enable &&
- common->secinfo.gtk_cipher)
+ if (common->secinfo.gtk_cipher)
flags |= RSI_WOW_GTK_REKEY;
if (sleep_status)
cmd_frame->wow_flags = flags;
diff --git a/drivers/net/wireless/rsi/rsi_main.h b/drivers/net/wireless/rsi/rsi_main.h
index a1065e5..0f53585 100644
--- a/drivers/net/wireless/rsi/rsi_main.h
+++ b/drivers/net/wireless/rsi/rsi_main.h
@@ -151,7 +151,6 @@ enum edca_queue {
};
struct security_info {
- bool security_enable;
u32 ptk_cipher;
u32 gtk_cipher;
};
--
1.9.1
The RSI_RATE_x bits must be assigned to struct rsi_data_desc rate_info
field. The rest of the driver does it correctly, except this one place,
so fix it. This is also aligned with the RSI downstream vendor driver.
Without this patch, an AP operating at 5 GHz does not transmit any
beacons at all, this patch fixes that.
Fixes: d26a9559403c ("rsi: add beacon changes for AP mode")
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: Amitkumar Karwar <amit.karwar(a)redpinesignals.com>
Cc: Angus Ainslie <angus(a)akkea.ca>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Kalle Valo <kvalo(a)codeaurora.org>
Cc: Karun Eagalapati <karun256(a)gmail.com>
Cc: Martin Kepplinger <martink(a)posteo.de>
Cc: Prameela Rani Garnepudi <prameela.j04cs(a)gmail.com>
Cc: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Cc: Siva Rebbagondla <siva8118(a)gmail.com>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
drivers/net/wireless/rsi/rsi_91x_hal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/rsi/rsi_91x_hal.c b/drivers/net/wireless/rsi/rsi_91x_hal.c
index ce9892152f4d..ab837921d9a4 100644
--- a/drivers/net/wireless/rsi/rsi_91x_hal.c
+++ b/drivers/net/wireless/rsi/rsi_91x_hal.c
@@ -470,9 +470,9 @@ int rsi_prepare_beacon(struct rsi_common *common, struct sk_buff *skb)
}
if (common->band == NL80211_BAND_2GHZ)
- bcn_frm->bbp_info |= cpu_to_le16(RSI_RATE_1);
+ bcn_frm->rate_info |= cpu_to_le16(RSI_RATE_1);
else
- bcn_frm->bbp_info |= cpu_to_le16(RSI_RATE_6);
+ bcn_frm->rate_info |= cpu_to_le16(RSI_RATE_6);
if (mac_bcn->data[tim_offset + 2] == 0)
bcn_frm->frame_info |= cpu_to_le16(RSI_DATA_DESC_DTIM_BEACON);
--
2.30.2
This is a note to let you know that I've just added the patch titled
serial_cs: remove wrong GLOBETROTTER.cis entry
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 6f9e17800b3156fff76458d53747c1f0547d0c5a Mon Sep 17 00:00:00 2001
From: Ondrej Zary <linux(a)zary.sk>
Date: Fri, 11 Jun 2021 22:19:39 +0200
Subject: serial_cs: remove wrong GLOBETROTTER.cis entry
The GLOBETROTTER.cis entry in serial_cs matches more devices than
intended and breaks them. Remove it.
Example: # pccardctl info
PRODID_1="Option International
"
PRODID_2="GSM-Ready 56K/ISDN
"
PRODID_3="021
"
PRODID_4="A
"
MANFID=0013,0000
FUNCID=0
result:
pcmcia 0.0: Direct firmware load for cis/GLOBETROTTER.cis failed with error -2
The GLOBETROTTER.cis is nowhere to be found. There's GLOBETROTTER.cis.ihex at
https://netdev.vger.kernel.narkive.com/h4inqdxM/patch-axnet-cs-fix-phy-id-d…
It's from completely diffetent card:
vers_1 4.1, "Option International", "GSM/GPRS GlobeTrotter", "001", "A"
Signed-off-by: Ondrej Zary <linux(a)zary.sk>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210611201940.23898-1-linux@zary.sk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/serial_cs.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/tty/serial/8250/serial_cs.c b/drivers/tty/serial/8250/serial_cs.c
index 3708114343b0..2f1d33ea26e1 100644
--- a/drivers/tty/serial/8250/serial_cs.c
+++ b/drivers/tty/serial/8250/serial_cs.c
@@ -813,7 +813,6 @@ static const struct pcmcia_device_id serial_ids[] = {
PCMCIA_DEVICE_CIS_PROD_ID12("ADVANTECH", "COMpad-32/85B-4", 0x96913a85, 0xcec8f102, "cis/COMpad4.cis"),
PCMCIA_DEVICE_CIS_PROD_ID123("ADVANTECH", "COMpad-32/85", "1.0", 0x96913a85, 0x8fbe92ae, 0x0877b627, "cis/COMpad2.cis"),
PCMCIA_DEVICE_CIS_PROD_ID2("RS-COM 2P", 0xad20b156, "cis/RS-COM-2P.cis"),
- PCMCIA_DEVICE_CIS_MANF_CARD(0x0013, 0x0000, "cis/GLOBETROTTER.cis"),
PCMCIA_DEVICE_PROD_ID12("ELAN DIGITAL SYSTEMS LTD, c1997.", "SERIAL CARD: SL100 1.00.", 0x19ca78af, 0xf964f42b),
PCMCIA_DEVICE_PROD_ID12("ELAN DIGITAL SYSTEMS LTD, c1997.", "SERIAL CARD: SL100", 0x19ca78af, 0x71d98e83),
PCMCIA_DEVICE_PROD_ID12("ELAN DIGITAL SYSTEMS LTD, c1997.", "SERIAL CARD: SL232 1.00.", 0x19ca78af, 0x69fb7490),
--
2.32.0
This is a note to let you know that I've just added the patch titled
serial_cs: Add Option International GSM-Ready 56K/ISDN modem
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 674b5396576b636a1ff101f692935748388c7325 Mon Sep 17 00:00:00 2001
From: Ondrej Zary <linux(a)zary.sk>
Date: Fri, 11 Jun 2021 22:19:40 +0200
Subject: serial_cs: Add Option International GSM-Ready 56K/ISDN modem
Add support for Option International GSM-Ready 56K/ISDN PCMCIA modem
card.
Signed-off-by: Ondrej Zary <linux(a)zary.sk>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210611201940.23898-2-linux@zary.sk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/serial_cs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/tty/serial/8250/serial_cs.c b/drivers/tty/serial/8250/serial_cs.c
index 2f1d33ea26e1..dc2ef05a10eb 100644
--- a/drivers/tty/serial/8250/serial_cs.c
+++ b/drivers/tty/serial/8250/serial_cs.c
@@ -786,6 +786,7 @@ static const struct pcmcia_device_id serial_ids[] = {
PCMCIA_DEVICE_PROD_ID12("Multi-Tech", "MT2834LT", 0x5f73be51, 0x4cd7c09e),
PCMCIA_DEVICE_PROD_ID12("OEM ", "C288MX ", 0xb572d360, 0xd2385b7a),
PCMCIA_DEVICE_PROD_ID12("Option International", "V34bis GSM/PSTN Data/Fax Modem", 0x9d7cd6f5, 0x5cb8bf41),
+ PCMCIA_DEVICE_PROD_ID12("Option International", "GSM-Ready 56K/ISDN", 0x9d7cd6f5, 0xb23844aa),
PCMCIA_DEVICE_PROD_ID12("PCMCIA ", "C336MX ", 0x99bcafe9, 0xaa25bcab),
PCMCIA_DEVICE_PROD_ID12("Quatech Inc", "PCMCIA Dual RS-232 Serial Port Card", 0xc4420b35, 0x92abc92f),
PCMCIA_DEVICE_PROD_ID12("Quatech Inc", "Dual RS-232 Serial Port PC Card", 0xc4420b35, 0x031a380d),
--
2.32.0
This is a note to let you know that I've just added the patch titled
serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 08a84410a04f05c7c1b8e833f552416d8eb9f6fe Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Thu, 10 Jun 2021 20:08:06 +0900
Subject: serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
Stop dmaengine transfer in sci_stop_tx(). Otherwise, the following
message is possible output when system enters suspend and while
transferring data, because clearing TIE bit in SCSCR is not able to
stop any dmaengine transfer.
sh-sci e6550000.serial: ttySC1: Unable to drain transmitter
Note that this driver has already used some #ifdef in the .c file
so that this patch also uses #ifdef to fix the issue. Otherwise,
build errors happens if the CONFIG_SERIAL_SH_SCI_DMA is disabled.
Fixes: 73a19e4c0301 ("serial: sh-sci: Add DMA support.")
Cc: <stable(a)vger.kernel.org> # v4.9+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/20210610110806.277932-1-yoshihiro.shimoda.uh@rene…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/sh-sci.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index df4f70716ba2..aabe66c99c1a 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -611,6 +611,14 @@ static void sci_stop_tx(struct uart_port *port)
ctrl &= ~SCSCR_TIE;
serial_port_out(port, SCSCR, ctrl);
+
+#ifdef CONFIG_SERIAL_SH_SCI_DMA
+ if (to_sci_port(port)->chan_tx &&
+ !dma_submit_error(to_sci_port(port)->cookie_tx)) {
+ dmaengine_terminate_async(to_sci_port(port)->chan_tx);
+ to_sci_port(port)->cookie_tx = -EINVAL;
+ }
+#endif
}
static void sci_start_rx(struct uart_port *port)
--
2.32.0
If the ROM status returned is unknown (RENESAS_ROM_STATUS_NO_RESULT)
we need to attempt loading the firmware rather than just skipping
it all together.
Cc: stable(a)vger.kernel.org
Cc: Mathias Nyman <mathias.nyman(a)intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Vinod Koul <vkoul(a)kernel.org>
Fixes: 2478be82de44 ("usb: renesas-xhci: Add ROM loader for uPD720201")
Signed-off-by: Moritz Fischer <mdf(a)kernel.org>
---
drivers/usb/host/xhci-pci-renesas.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/host/xhci-pci-renesas.c b/drivers/usb/host/xhci-pci-renesas.c
index f97ac9f52bf4..dfe54f0afc4b 100644
--- a/drivers/usb/host/xhci-pci-renesas.c
+++ b/drivers/usb/host/xhci-pci-renesas.c
@@ -207,7 +207,8 @@ static int renesas_check_rom_state(struct pci_dev *pdev)
return 0;
case RENESAS_ROM_STATUS_NO_RESULT: /* No result yet */
- return 0;
+ dev_dbg(&pdev->dev, "Unknown ROM status ...\n");
+ break;
case RENESAS_ROM_STATUS_ERROR: /* Error State */
default: /* All other states are marked as "Reserved states" */
@@ -224,13 +225,11 @@ static int renesas_fw_check_running(struct pci_dev *pdev)
u8 fw_state;
int err;
- /* Check if device has ROM and loaded, if so skip everything */
- err = renesas_check_rom(pdev);
- if (err) { /* we have rom */
- err = renesas_check_rom_state(pdev);
- if (!err)
- return err;
- }
+ /* Only if device has ROM and loaded FW we can skip loading and
+ * return success. Otherwise (even unknown state), attempt to load FW.
+ */
+ if (renesas_check_rom(pdev) && !renesas_check_rom_state(pdev))
+ return 0;
/*
* Test if the device is actually needing the firmware. As most
--
2.31.1
This is a note to let you know that I've just added the patch titled
fpga: stratix10-soc: Add missing fpga_mgr_free() call
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From d9ec9daa20eb8de1efe6abae78c9835ec8ed86f9 Mon Sep 17 00:00:00 2001
From: Russ Weight <russell.h.weight(a)intel.com>
Date: Mon, 14 Jun 2021 10:09:03 -0700
Subject: fpga: stratix10-soc: Add missing fpga_mgr_free() call
The stratix10-soc driver uses fpga_mgr_create() function and is therefore
responsible to call fpga_mgr_free() to release the class driver resources.
Add a missing call to fpga_mgr_free in the s10_remove() function.
Signed-off-by: Russ Weight <russell.h.weight(a)intel.com>
Reviewed-by: Xu Yilun <yilun.xu(a)intel.com>
Signed-off-by: Moritz Fischer <mdf(a)kernel.org>
Fixes: e7eef1d7633a ("fpga: add intel stratix10 soc fpga manager driver")
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210614170909.232415-3-mdf@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/fpga/stratix10-soc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/fpga/stratix10-soc.c b/drivers/fpga/stratix10-soc.c
index 2aeb53f8e9d0..a2cea500f7cc 100644
--- a/drivers/fpga/stratix10-soc.c
+++ b/drivers/fpga/stratix10-soc.c
@@ -454,6 +454,7 @@ static int s10_remove(struct platform_device *pdev)
struct s10_priv *priv = mgr->priv;
fpga_mgr_unregister(mgr);
+ fpga_mgr_free(mgr);
stratix10_svc_free_channel(priv->chan);
return 0;
--
2.32.0
Hello,
After I've upgraded from 5.7.11 (didn't have access to this machine for
about 10 months) to 5.12.10, I've noticed that anytime I used all my
cores to, for example, compile a project, the system would degrade
significantly in performance and applications would start to stutter.
Compiles are also about 3-4x slower on kernels with the regression vs.
without. After debugging this for the past 24 hours or so, I've narrowed
it down to a change between 5.7.19 and 5.8.1. Sadly, bisect does not
help, because trying to run any of the 5.8 RC kernels causes the kernel
to be stuck before init, without any apparent errors on the screen (and
I don't have a serial cable to dump the kernel output to). I'm listing
all the information I know and my system information below.
Reproduction steps (dunno if this helps, but):
1. Boot with kernel with the regression
2. Do something that uses all cores, like compiling the Linux kernel
3. Observe long compile times and stuttering applications (which doesn't
happen even on full load with a working kernel)
Regression between kernel versions:
5.7 - working
5.7.11 - working
5.7.19 - working
5.8.1 - broken
5.8.18 - broken
5.12.10 - broken
8ecfa36cd4db3275bf3b6c6f32c7e3c6bb537de2 (master on 2021-06-13) - broken
System information:
Gentoo Linux 17.1 amd64
CPU info:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
stepping : 3
microcode : 0x27
cpu MHz : 2042.008
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow
vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep
bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb
flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple
shadow_vmcs
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
mds swapgs itlb_multihit srbds
bogomips : 7199.96
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
RAM: 16GiB
GPU: VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Redwood XT [Radeon HD 5670/5690/5730]
cat /proc/sys/kernel/tainted: 0
Thanks in advance.
Best Regards,
Adam Edge
This is the start of the stable review cycle for the 4.9.273 release.
There are 42 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 16 Jun 2021 10:26:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.273-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.273-rc1
Liangyan <liangyan.peng(a)linux.alibaba.com>
tracing: Correct the length check which causes memory corruption
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ftrace: Do not blindly read the ip address in ftrace_bug()
Ming Lei <ming.lei(a)redhat.com>
scsi: core: Only put parent device if host state differs from SHOST_CREATED
Ming Lei <ming.lei(a)redhat.com>
scsi: core: Fix error handling of scsi_host_alloc()
Dai Ngo <dai.ngo(a)oracle.com>
NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: fix previous commit for 32-bit builds
Leo Yan <leo.yan(a)linaro.org>
perf session: Correct buffer copying when peeking events
Dan Carpenter <dan.carpenter(a)oracle.com>
NFS: Fix a potential NULL dereference in nfs_get_client()
Marco Elver <elver(a)google.com>
perf: Fix data race between pin_count increment/decrement
Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
regulator: core: resolve supply for boot-on/always-on regulators
Maciej Żenczykowski <maze(a)google.com>
usb: fix various gadget panics on 10gbps cabling
Maciej Żenczykowski <maze(a)google.com>
usb: fix various gadgets null ptr deref on 10gbps cabling.
Linyu Yuan <linyyuan(a)codeaurora.com>
usb: gadget: eem: fix wrong eem header operation
Johan Hovold <johan(a)kernel.org>
USB: serial: quatech2: fix control-request directions
Alexandre GRIVEAUX <agriveaux(a)deutnet.info>
USB: serial: omninet: add device id for Zyxel Omni 56K Plus
George McCollister <george.mccollister(a)gmail.com>
USB: serial: ftdi_sio: add NovaTech OrionMX product ID
Marian-Cristian Rotariu <marian.c.rotariu(a)gmail.com>
usb: dwc3: ep0: fix NULL pointer exception
Maciej Żenczykowski <maze(a)google.com>
USB: f_ncm: ncm_bitrate (speed) is unsigned
Alexander Kuznetsov <wwfq(a)yandex-team.ru>
cgroup1: don't allow '\n' in renaming
Ritesh Harjani <riteshh(a)linux.ibm.com>
btrfs: return value from btrfs_mark_extent_written() in case of error
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: avoid speculation-based attacks from out-of-range memslot accesses
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
drm: Lock pointer access in drm_master_release()
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
i2c: mpc: implement erratum A-004447 workaround
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
i2c: mpc: Make use of i2c_recover_bus()
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
bnx2x: Fix missing error code in bnx2x_iov_init_one()
Tiezhu Yang <yangtiezhu(a)loongson.cn>
MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
Saubhik Mukherjee <saubhik.mukherjee(a)gmail.com>
net: appletalk: cops: Fix data race in cops_probe1
Zong Li <zong.li(a)sifive.com>
net: macb: ensure the device is available before accessing GEMGXL control registers
Dmitry Bogdanov <d.bogdanov(a)yadro.com>
scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
Matt Wang <wwentao(a)vmware.com>
scsi: vmw_pvscsi: Set correct residual data length
Zheyu Ma <zheyuma97(a)gmail.com>
net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
Sergey Senozhatsky <senozhatsky(a)chromium.org>
wq: handle VM suspension in stall detection
Shakeel Butt <shakeelb(a)google.com>
cgroup: disable controllers at parse time
Dan Carpenter <dan.carpenter(a)oracle.com>
net: mdiobus: get rid of a BUG_ON()
Johannes Berg <johannes.berg(a)intel.com>
netlink: disable IRQs for netlink_lock_table()
Johannes Berg <johannes.berg(a)intel.com>
bonding: init notify_work earlier to avoid uninitialized use
Zheyu Ma <zheyuma97(a)gmail.com>
isdn: mISDN: netjet: Fix crash in nj_probe:
Zou Wei <zou_wei(a)huawei.com>
ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
Jeimon <jjjinmeng.zhou(a)gmail.com>
net/nfc/rawsock.c: fix a permission check bug
Kees Cook <keescook(a)chromium.org>
proc: Track /proc/$pid/attr/ opener mm_struct
-------------
Diffstat:
Makefile | 4 +-
arch/mips/lib/mips-atomic.c | 12 +--
arch/powerpc/boot/dts/fsl/p1010si-post.dtsi | 8 ++
arch/powerpc/boot/dts/fsl/p2041si-post.dtsi | 16 ++++
drivers/gpu/drm/drm_auth.c | 3 +-
drivers/i2c/busses/i2c-mpc.c | 95 ++++++++++++++++++++++-
drivers/isdn/hardware/mISDN/netjet.c | 1 -
drivers/net/appletalk/cops.c | 4 +-
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 4 +-
drivers/net/ethernet/cadence/macb.c | 3 +
drivers/net/ethernet/qlogic/qla3xxx.c | 2 +-
drivers/net/phy/mdio_bus.c | 3 +-
drivers/regulator/core.c | 6 ++
drivers/scsi/hosts.c | 25 +++---
drivers/scsi/qla2xxx/qla_target.c | 2 +
drivers/scsi/vmw_pvscsi.c | 8 +-
drivers/usb/dwc3/ep0.c | 3 +
drivers/usb/gadget/config.c | 8 ++
drivers/usb/gadget/function/f_ecm.c | 2 +-
drivers/usb/gadget/function/f_eem.c | 6 +-
drivers/usb/gadget/function/f_loopback.c | 2 +-
drivers/usb/gadget/function/f_ncm.c | 2 +-
drivers/usb/gadget/function/f_printer.c | 3 +-
drivers/usb/gadget/function/f_rndis.c | 2 +-
drivers/usb/gadget/function/f_serial.c | 2 +-
drivers/usb/gadget/function/f_sourcesink.c | 3 +-
drivers/usb/gadget/function/f_subset.c | 2 +-
drivers/usb/gadget/function/f_tcm.c | 3 +-
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 1 +
drivers/usb/serial/omninet.c | 2 +
drivers/usb/serial/quatech2.c | 6 +-
fs/btrfs/file.c | 4 +-
fs/nfs/client.c | 2 +-
fs/nfs/nfs4proc.c | 8 ++
fs/proc/base.c | 9 ++-
include/linux/kvm_host.h | 11 ++-
kernel/cgroup.c | 17 ++--
kernel/events/core.c | 2 +
kernel/trace/ftrace.c | 8 +-
kernel/trace/trace.c | 2 +-
kernel/workqueue.c | 12 ++-
net/netlink/af_netlink.c | 6 +-
net/nfc/rawsock.c | 2 +-
sound/soc/codecs/sti-sas.c | 1 +
tools/perf/util/session.c | 1 +
47 files changed, 266 insertions(+), 65 deletions(-)
The patch titled
Subject: mm: page_vma_mapped_walk(): use pmd_read_atomic()
has been removed from the -mm tree. Its filename was
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
This patch was dropped because it was withdrawn
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): use pmd_read_atomic()
page_vma_mapped_walk() cleanup: use pmd_read_atomic() with barrier()
instead of READ_ONCE() for pmde: some architectures (e.g. i386 with PAE)
have a multi-word pmd entry, for which READ_ONCE() is not good enough.
Link: https://lkml.kernel.org/r/594c1f0-d396-5346-1f36-606872cddb18@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-use-pmd_read_atomic
+++ a/mm/page_vma_mapped.c
@@ -182,13 +182,16 @@ restart:
pud = pud_offset(p4d, pvmw->address);
if (!pud_present(*pud))
return false;
+
pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
* Make sure the pmd value isn't cached in a register by the
* compiler and used as a stale value after we've observed a
* subsequent update.
*/
- pmde = READ_ONCE(*pvmw->pmd);
+ pmde = pmd_read_atomic(pvmw->pmd);
+ barrier();
+
if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
if (likely(pmd_trans_huge(*pvmw->pmd))) {
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-futex-fix-shared-futex-pgoff-on-shmem-huge-page.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm/gup: fix try_grab_compound_head() race with split_huge_page()
has been added to the -mm tree. Its filename is
mm-gup-fix-try_grab_compound_head-race-with-split_huge_page.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-gup-fix-try_grab_compound_head…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-gup-fix-try_grab_compound_head…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/gup: fix try_grab_compound_head() race with split_huge_page()
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent freeing
of page tables (via local_irq_save()), but not against concurrent TLB
flushes, freeing of data pages, or splitting of compound pages.
Because no reference is held to the page when try_grab_compound_head() is
called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound page
may have been split up (either explicitly through split_huge_page() or by
freeing the compound page to the buddy allocator and then allocating its
individual order-0 pages). If that happens, get_user_pages_fast() may end
up returning the right page but lifting the refcount on a now-unrelated
page, leading to use-after-free of pages.
To fix it: Re-check whether the pages still belong together after lifting
the refcount on the head page. Move anything else that checks
compound_head(page) below the refcount increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so
that PV TLB flushes are used instead of IPIs).
As requested on the list, also replace the existing VM_BUG_ON_PAGE() with
a warning and bailout. Since the existing code only performed the BUG_ON
check on DEBUG_VM kernels, ensure that the new code also only performs the
check under that configuration - I don't want to mix two logically
separate changes together too much. The macro VM_WARN_ON_ONCE_PAGE()
doesn't return a value on !DEBUG_VM, so wrap the whole check in an #ifdef
block. An alternative would be to change the VM_WARN_ON_ONCE_PAGE()
definition for !DEBUG_VM such that it always returns false, but since that
would differ from the behavior of the normal WARN macros, it might be too
confusing for readers.
Link: https://lkml.kernel.org/r/20210615012014.1100672-1-jannh@google.com
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 58 +++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 43 insertions(+), 15 deletions(-)
--- a/mm/gup.c~mm-gup-fix-try_grab_compound_head-race-with-split_huge_page
+++ a/mm/gup.c
@@ -44,6 +44,23 @@ static void hpage_pincount_sub(struct pa
atomic_sub(refs, compound_pincount_ptr(page));
}
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+#ifdef CONFIG_DEBUG_VM
+ if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
+ return;
+#endif
+
+ /*
+ * Calling put_page() for each ref is unnecessarily slow. Only the last
+ * ref needs a put_page().
+ */
+ if (refs > 1)
+ page_ref_sub(page, refs - 1);
+ put_page(page);
+}
+
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
@@ -56,6 +73,21 @@ static inline struct page *try_get_compo
return NULL;
if (unlikely(!page_cache_add_speculative(head, refs)))
return NULL;
+
+ /*
+ * At this point we have a stable reference to the head page; but it
+ * could be that between the compound_head() lookup and the refcount
+ * increment, the compound page was split, in which case we'd end up
+ * holding a reference on a page that has nothing to do with the page
+ * we were given anymore.
+ * So now that the head page is stable, recheck that the pages still
+ * belong together.
+ */
+ if (unlikely(compound_head(page) != head)) {
+ put_page_refs(head, refs);
+ return NULL;
+ }
+
return head;
}
@@ -96,6 +128,14 @@ __maybe_unused struct page *try_grab_com
return NULL;
/*
+ * CAUTION: Don't use compound_head() on the page before this
+ * point, the result won't be stable.
+ */
+ page = try_get_compound_head(page, refs);
+ if (!page)
+ return NULL;
+
+ /*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
* track it, via hpage_pincount_add/_sub().
@@ -103,15 +143,10 @@ __maybe_unused struct page *try_grab_com
* However, be sure to *also* increment the normal page refcount
* field at least once, so that the page really is pinned.
*/
- if (!hpage_pincount_available(page))
- refs *= GUP_PIN_COUNTING_BIAS;
-
- page = try_get_compound_head(page, refs);
- if (!page)
- return NULL;
-
if (hpage_pincount_available(page))
hpage_pincount_add(page, refs);
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
@@ -135,14 +170,7 @@ static void put_compound_head(struct pag
refs *= GUP_PIN_COUNTING_BIAS;
}
- VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
+ put_page_refs(page, refs);
}
/**
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-gup-fix-try_grab_compound_head-race-with-split_huge_page.patch
The patch titled
Subject: mm, futex: fix shared futex pgoff on shmem huge page
has been added to the -mm tree. Its filename is
mm-futex-fix-shared-futex-pgoff-on-shmem-huge-page.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-futex-fix-shared-futex-pgoff-o…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-futex-fix-shared-futex-pgoff-o…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm, futex: fix shared futex pgoff on shmem huge page
If more than one futex is placed on a shmem huge page, it can happen that
waking the second wakes the first instead, and leaves the second waiting:
the key's shared.pgoff is wrong.
When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when
generating futex_key"), the only shared huge pages came from hugetlbfs,
and the code added to deal with its exceptional page->index was put into
hugetlb source. Then that was missed when 4.8 added shmem huge pages.
page_to_pgoff() is what others use for this nowadays: except that, as
currently written, it gives the right answer on hugetlbfs head, but
nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific
hugetlb_basepage_index() on PageHuge tails as well as on head.
Yes, it's unconventional to declare hugetlb_basepage_index() there in
pagemap.h, rather than in hugetlb.h; but I do not expect anything but
page_to_pgoff() ever to need it.
Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Reported-by: Neel Natu <neelnatu(a)google.com>
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Zhang Yi <wetpzy(a)gmail.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Darren Hart <dvhart(a)infradead.org>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 16 ----------------
include/linux/pagemap.h | 13 +++++++------
kernel/futex.c | 3 +--
mm/hugetlb.c | 5 +----
4 files changed, 9 insertions(+), 28 deletions(-)
--- a/include/linux/hugetlb.h~mm-futex-fix-shared-futex-pgoff-on-shmem-huge-page
+++ a/include/linux/hugetlb.h
@@ -741,17 +741,6 @@ static inline int hstate_index(struct hs
return h - hstates;
}
-pgoff_t __basepage_index(struct page *page);
-
-/* Return page->index in PAGE_SIZE units */
-static inline pgoff_t basepage_index(struct page *page)
-{
- if (!PageCompound(page))
- return page->index;
-
- return __basepage_index(page);
-}
-
extern int dissolve_free_huge_page(struct page *page);
extern int dissolve_free_huge_pages(unsigned long start_pfn,
unsigned long end_pfn);
@@ -988,11 +977,6 @@ static inline int hstate_index(struct hs
return 0;
}
-static inline pgoff_t basepage_index(struct page *page)
-{
- return page->index;
-}
-
static inline int dissolve_free_huge_page(struct page *page)
{
return 0;
--- a/include/linux/pagemap.h~mm-futex-fix-shared-futex-pgoff-on-shmem-huge-page
+++ a/include/linux/pagemap.h
@@ -516,7 +516,7 @@ static inline struct page *read_mapping_
}
/*
- * Get index of the page with in radix-tree
+ * Get index of the page within radix-tree (but not for hugetlb pages).
* (TODO: remove once hugetlb pages will have ->index in PAGE_SIZE)
*/
static inline pgoff_t page_to_index(struct page *page)
@@ -536,14 +536,15 @@ static inline pgoff_t page_to_index(stru
}
/*
- * Get the offset in PAGE_SIZE.
- * (TODO: hugepage should have ->index in PAGE_SIZE)
+ * Get the offset in PAGE_SIZE (even for hugetlb pages).
+ * (TODO: hugetlb pages should have ->index in PAGE_SIZE)
*/
static inline pgoff_t page_to_pgoff(struct page *page)
{
- if (unlikely(PageHeadHuge(page)))
- return page->index << compound_order(page);
-
+ if (unlikely(PageHuge(page))) {
+ extern pgoff_t hugetlb_basepage_index(struct page *page);
+ return hugetlb_basepage_index(page);
+ }
return page_to_index(page);
}
--- a/kernel/futex.c~mm-futex-fix-shared-futex-pgoff-on-shmem-huge-page
+++ a/kernel/futex.c
@@ -35,7 +35,6 @@
#include <linux/jhash.h>
#include <linux/pagemap.h>
#include <linux/syscalls.h>
-#include <linux/hugetlb.h>
#include <linux/freezer.h>
#include <linux/memblock.h>
#include <linux/fault-inject.h>
@@ -650,7 +649,7 @@ again:
key->both.offset |= FUT_OFF_INODE; /* inode-based key */
key->shared.i_seq = get_inode_sequence_number(inode);
- key->shared.pgoff = basepage_index(tail);
+ key->shared.pgoff = page_to_pgoff(tail);
rcu_read_unlock();
}
--- a/mm/hugetlb.c~mm-futex-fix-shared-futex-pgoff-on-shmem-huge-page
+++ a/mm/hugetlb.c
@@ -1588,15 +1588,12 @@ struct address_space *hugetlb_page_mappi
return NULL;
}
-pgoff_t __basepage_index(struct page *page)
+pgoff_t hugetlb_basepage_index(struct page *page)
{
struct page *page_head = compound_head(page);
pgoff_t index = page_index(page_head);
unsigned long compound_idx;
- if (!PageHuge(page_head))
- return page_index(page);
-
if (compound_order(page_head) >= MAX_ORDER)
compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
else
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-futex-fix-shared-futex-pgoff-on-shmem-huge-page.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent
freeing of page tables (via local_irq_save()), but not against
concurrent TLB flushes, freeing of data pages, or splitting of compound
pages.
Because no reference is held to the page when try_grab_compound_head()
is called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound
page may have been split up (either explicitly through split_huge_page()
or by freeing the compound page to the buddy allocator and then
allocating its individual order-0 pages).
If that happens, get_user_pages_fast() may end up returning the right
page but lifting the refcount on a now-unrelated page, leading to
use-after-free of pages.
To fix it:
Re-check whether the pages still belong together after lifting the
refcount on the head page.
Move anything else that checks compound_head(page) below the refcount
increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits,
so that PV TLB flushes are used instead of IPIs).
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: stable(a)vger.kernel.org
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
resending because linux-mm was down
mm/gup.c | 54 +++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 39 insertions(+), 15 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 3ded6a5f26b2..1f9c0ac15073 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -43,8 +43,21 @@ static void hpage_pincount_sub(struct page *page, int refs)
atomic_sub(refs, compound_pincount_ptr(page));
}
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+ VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
+ /*
+ * Calling put_page() for each ref is unnecessarily slow. Only the last
+ * ref needs a put_page().
+ */
+ if (refs > 1)
+ page_ref_sub(page, refs - 1);
+ put_page(page);
+}
+
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
*/
@@ -55,8 +68,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
if (WARN_ON_ONCE(page_ref_count(head) < 0))
return NULL;
if (unlikely(!page_cache_add_speculative(head, refs)))
return NULL;
+
+ /*
+ * At this point we have a stable reference to the head page; but it
+ * could be that between the compound_head() lookup and the refcount
+ * increment, the compound page was split, in which case we'd end up
+ * holding a reference on a page that has nothing to do with the page
+ * we were given anymore.
+ * So now that the head page is stable, recheck that the pages still
+ * belong together.
+ */
+ if (unlikely(compound_head(page) != head)) {
+ put_page_refs(head, refs);
+ return NULL;
+ }
+
return head;
}
/*
@@ -94,25 +122,28 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
if (unlikely((flags & FOLL_LONGTERM) &&
!is_pinnable_page(page)))
return NULL;
+ /*
+ * CAUTION: Don't use compound_head() on the page before this
+ * point, the result won't be stable.
+ */
+ page = try_get_compound_head(page, refs);
+ if (!page)
+ return NULL;
+
/*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
* track it, via hpage_pincount_add/_sub().
*
* However, be sure to *also* increment the normal page refcount
* field at least once, so that the page really is pinned.
*/
- if (!hpage_pincount_available(page))
- refs *= GUP_PIN_COUNTING_BIAS;
-
- page = try_get_compound_head(page, refs);
- if (!page)
- return NULL;
-
if (hpage_pincount_available(page))
hpage_pincount_add(page, refs);
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
@@ -134,16 +165,9 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
else
refs *= GUP_PIN_COUNTING_BIAS;
}
- VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
+ put_page_refs(page, refs);
}
/**
* try_grab_page() - elevate a page's refcount by a flag-dependent amount
base-commit: 614124bea77e452aa6df7a8714e8bc820b489922
--
2.32.0.272.g935e593368-goog
This patch series contains stability fixes and error handling for remoteproc.
The changes included in this series do the following:
Patch 1: Fixes the creation of the rproc character device.
Patch 2: Validates rproc as the first step of rproc_add().
Patch 3: Fixes the rproc cdev remove and the order of dev_del() and cdev_del().
Patch 4: Adds error handling in rproc_add().
v1 -> v2:
- Added extra patch which addresses Bjorn's comments on patch 3
from v1.
- Fixed commit text for patch 2 (s/calling making/making).
Siddharth Gupta (4):
remoteproc: core: Move cdev add before device add
remoteproc: core: Move validate before device add
remoteproc: core: Fix cdev remove and rproc del
remoteproc: core: Cleanup device in case of failure
0000-cover-letter.patch.backup | 26 ++++++++++++++++++++++++++
drivers/remoteproc/remoteproc_cdev.c | 2 +-
drivers/remoteproc/remoteproc_core.c | 27 ++++++++++++++++++---------
3 files changed, 45 insertions(+), 10 deletions(-)
create mode 100644 0000-cover-letter.patch.backup
--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
The Cypress CY7C65632 appears to have an issue with auto suspend and
detecting devices, not too dissimilar to the SMSC 5534B hub. It is
easiest to reproduce by connecting multiple mass storage devices to
the hub at the same time. On a Lenovo Yoga, around 1 in 3 attempts
result in the devices not being detected. It is however possible to
make them appear using lsusb -v.
Disabling autosuspend for this hub resolves the issue.
Cc: stable(a)vger.kernel.org
Fixes: 1208f9e1d758 ("USB: hub: Fix the broken detection of USB3 device in SMSC hub")
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
---
drivers/usb/core/hub.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 3bd379d5d1be..d1efc7141333 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -41,6 +41,8 @@
#define USB_VENDOR_GENESYS_LOGIC 0x05e3
#define USB_VENDOR_SMSC 0x0424
#define USB_PRODUCT_USB5534B 0x5534
+#define USB_VENDOR_CYPRESS 0x04b4
+#define USB_PRODUCT_CY7C65632 0x6570
#define HUB_QUIRK_CHECK_PORT_AUTOSUSPEND 0x01
#define HUB_QUIRK_DISABLE_AUTOSUSPEND 0x02
@@ -5719,6 +5721,11 @@ static const struct usb_device_id hub_id_table[] = {
.idProduct = USB_PRODUCT_USB5534B,
.bInterfaceClass = USB_CLASS_HUB,
.driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND},
+ { .match_flags = USB_DEVICE_ID_MATCH_VENDOR
+ | USB_DEVICE_ID_MATCH_PRODUCT,
+ .idVendor = USB_VENDOR_CYPRESS,
+ .idProduct = USB_PRODUCT_CY7C65632,
+ .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND},
{ .match_flags = USB_DEVICE_ID_MATCH_VENDOR
| USB_DEVICE_ID_MATCH_INT_CLASS,
.idVendor = USB_VENDOR_GENESYS_LOGIC,
--
2.32.0
The early check if we should attempt compression does not take into
account the number of input pages. It can happen that there's only one
page, eg. a tail page after some ranges of the BTRFS_MAX_UNCOMPRESSED
have been processed, or an isolated page that won't be converted to an
inline extent.
The single page would be compressed but a later check would drop it
again because the result size must be at least one block shorter than
the input. That can never work with just one page.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a2494c645681..e6eb20987351 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -629,7 +629,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
* inode has not been flagged as nocompress. This flag can
* change at any time if we discover bad compression ratios.
*/
- if (inode_need_compress(BTRFS_I(inode), start, end)) {
+ if (nr_pages > 1 && inode_need_compress(BTRFS_I(inode), start, end)) {
WARN_ON(pages);
pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
if (!pages) {
--
2.31.1
We use the async_delalloc_pages mechanism to make sure that we've
completed our async work before trying to continue our delalloc
flushing. The reason for this is we need to see any ordered extents
that were created by our delalloc flushing. However we're waking up
before we do the submit work, which is before we create the ordered
extents. This is a pretty wide race window where we could potentially
think there are no ordered extents and thus exit shrink_delalloc
prematurely. Fix this by waking us up after we've done the work to
create ordered extents.
cc: stable(a)vger.kernel.org
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/inode.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6cb73ff59c7c..c37271df2c6d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1271,11 +1271,6 @@ static noinline void async_cow_submit(struct btrfs_work *work)
nr_pages = (async_chunk->end - async_chunk->start + PAGE_SIZE) >>
PAGE_SHIFT;
- /* atomic_sub_return implies a barrier */
- if (atomic_sub_return(nr_pages, &fs_info->async_delalloc_pages) <
- 5 * SZ_1M)
- cond_wake_up_nomb(&fs_info->async_submit_wait);
-
/*
* ->inode could be NULL if async_chunk_start has failed to compress,
* in which case we don't have anything to submit, yet we need to
@@ -1284,6 +1279,11 @@ static noinline void async_cow_submit(struct btrfs_work *work)
*/
if (async_chunk->inode)
submit_compressed_extents(async_chunk);
+
+ /* atomic_sub_return implies a barrier */
+ if (atomic_sub_return(nr_pages, &fs_info->async_delalloc_pages) <
+ 5 * SZ_1M)
+ cond_wake_up_nomb(&fs_info->async_submit_wait);
}
static noinline void async_cow_free(struct btrfs_work *work)
--
2.26.3
The following error was noticed on stable-rc 5.12, 5.10, 5.4, 4.19,
4.14, 4.9 and 4.4
for i386 and arm.
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=arm
CROSS_COMPILE=arm-linux-gnueabihf- 'CC=sccache
arm-linux-gnueabihf-gcc' 'HOSTCC=sccache gcc'
In file included from /builds/linux/include/linux/kernel.h:11,
from /builds/linux/include/linux/list.h:9,
from /builds/linux/include/linux/preempt.h:11,
from /builds/linux/include/linux/hardirq.h:5,
from /builds/linux/include/linux/kvm_host.h:7,
from
/builds/linux/arch/arm/kvm/../../../virt/kvm/kvm_main.c:18:
In function '__gfn_to_hva_memslot',
inlined from '__gfn_to_hva_many.part.6' at
/builds/linux/arch/arm/kvm/../../../virt/kvm/kvm_main.c:1446:9,
inlined from '__gfn_to_hva_many' at
/builds/linux/arch/arm/kvm/../../../virt/kvm/kvm_main.c:1434:22:
/builds/linux/include/linux/compiler.h:417:38: error: call to
'__compiletime_assert_59' declared with attribute error: BUILD_BUG_ON
failed: sizeof(_i) > sizeof(long)
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
ref:
https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/134260…
--
Linaro LKFT
https://lkft.linaro.org
From: Jarkko Sakkinen <jarkko(a)kernel.org>
BUG_ON() should not be used in the kernel code, unless there are
exceptional reasons to do so. Replace BUG_ON() with WARN() and
return.
Cc: stable(a)vger.kernel.org
Fixes: b3811d36a3e7 ("KEYS: checking the input id parameters before finding asymmetric key")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
No changes from original submission by Jarkko.
crypto/asymmetric_keys/asymmetric_type.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/crypto/asymmetric_keys/asymmetric_type.c b/crypto/asymmetric_keys/asymmetric_type.c
index ad8af3d70ac..a00bed3e04d 100644
--- a/crypto/asymmetric_keys/asymmetric_type.c
+++ b/crypto/asymmetric_keys/asymmetric_type.c
@@ -54,7 +54,10 @@ struct key *find_asymmetric_key(struct key *keyring,
char *req, *p;
int len;
- BUG_ON(!id_0 && !id_1);
+ if (!id_0 && !id_1) {
+ WARN(1, "All ID's are NULL\n");
+ return ERR_PTR(-EINVAL);
+ }
if (id_0) {
lookup = id_0->data;
--
2.27.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6c14133d2d3f768e0a35128faac8aa6ed4815051 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 7 Jun 2021 21:39:08 -0400
Subject: [PATCH] ftrace: Do not blindly read the ip address in ftrace_bug()
It was reported that a bug on arm64 caused a bad ip address to be used for
updating into a nop in ftrace_init(), but the error path (rightfully)
returned -EINVAL and not -EFAULT, as the bug caused more than one error to
occur. But because -EINVAL was returned, the ftrace_bug() tried to report
what was at the location of the ip address, and read it directly. This
caused the machine to panic, as the ip was not pointing to a valid memory
address.
Instead, read the ip address with copy_from_kernel_nofault() to safely
access the memory, and if it faults, report that the address faulted,
otherwise report what was in that location.
Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.c…
Cc: stable(a)vger.kernel.org
Fixes: 05736a427f7e1 ("ftrace: warn on failure to disable mcount callers")
Reported-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2e8a3fde7104..72ef4dccbcc4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1967,12 +1967,18 @@ static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
static void print_ip_ins(const char *fmt, const unsigned char *p)
{
+ char ins[MCOUNT_INSN_SIZE];
int i;
+ if (copy_from_kernel_nofault(ins, p, MCOUNT_INSN_SIZE)) {
+ printk(KERN_CONT "%s[FAULT] %px\n", fmt, p);
+ return;
+ }
+
printk(KERN_CONT "%s", fmt);
for (i = 0; i < MCOUNT_INSN_SIZE; i++)
- printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]);
+ printk(KERN_CONT "%s%02x", i ? ":" : "", ins[i]);
}
enum ftrace_bug_type ftrace_bug_type;
Dear Friend,
My name is Mr. Ibrahim Abdoul. If the content of this message is
contrary to your moral ethics, please accept my apology. But, please
treat with absolute secrecy. I am working with one of the prime banks in
Burkina Faso. Here in this bank existed a dormant account for many
years, which belong to one of our late foreign customer. The amount in
this account stands at $13,300,000.00 (Thirteen Million Three Hundred
Thousand US Dollars).
I was very fortunate to come across the deceased customer's security
file during documentation of old and abandoned customer’s files for an
official documentation and audit of the year 2020.
I want a foreign account where the bank will transfer this fund. I know
you would be surprised to read this message, especially from someone
relatively unknown to you. But, do not worry yourself so much. This is a
genuine, risk free and legal business transaction. All details shall be
sent to you once I hear from you.
If you are really sure of your sincerity, trustworthiness,
accountability and confidentiality over this transaction, reply back to
me urgently. cantact me on (mr.ibrahimabdoul2020(a)gmail.com)
Best regards,
Mr. Ibrahim Abdoul
cantact me on (mr.ibrahimabdoul2020(a)gmail.com)
It's not sufficient to skip reading when the pos is beyond the EOF.
There may be data at the head of the page that we need to fill in
before the write. Only elide the read if the pos is beyond the last page
in the file.
Cc: <stable(a)vger.kernel.org> # v5.10 .. v5.12
Fixes: 1cc1699070bd ("ceph: fold ceph_update_writeable_page into ceph_write_begin")
Reported-by: Andrew W Elble <aweits(a)rit.edu>
Signed-off-by: Jeff Layton <jlayton(a)kernel.org>
---
fs/ceph/addr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Note to stable maintainers: This is needed in v5.10.z - v5.12.z. In
v5.13, we've moved to using the new netfs_read_helper code so this isn't
necessary there.
I also now have a simple testcase for this that I'll submit to xfstests
early next week.
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 26e66436f005..e636fb8275e1 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1353,11 +1353,11 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping,
/*
* In some cases we don't need to read at all:
* - full page write
- * - write that lies completely beyond EOF
+ * - write that lies in a page that is completely beyond EOF
* - write that covers the the page from start to EOF or beyond it
*/
if ((pos_in_page == 0 && len == PAGE_SIZE) ||
- (pos >= i_size_read(inode)) ||
+ (index > (i_size_read(inode) / PAGE_SIZE)) ||
(pos_in_page == 0 && (pos + len) >= i_size_read(inode))) {
zero_user_segments(page, 0, pos_in_page,
pos_in_page + len, PAGE_SIZE);
--
2.31.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0ffb4c12f7fa89163e228e6f27df09b46631db1 Mon Sep 17 00:00:00 2001
From: Mark Zhang <markzhang(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:18:03 +0300
Subject: [PATCH] RDMA/mlx5: Use different doorbell memory for different
processes
In a fork scenario, the parent and child can have same virtual address and
also share the uverbs fd. That causes to the list_for_each_entry search
return same doorbell physical page for all processes, even though that
page has been COW' or copied.
This patch takes the mm_struct into consideration during search, to make
sure that VA's belonging to different processes are not intermixed.
Resolves the malfunction of uverbs after fork in some specific cases.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/feacc23fe0bc6e1088c6824d5583798745e72405.16227262…
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Mark Zhang <markzhang(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index 61475b571531..7af4df7a6823 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -41,6 +41,7 @@ struct mlx5_ib_user_db_page {
struct ib_umem *umem;
unsigned long user_virt;
int refcnt;
+ struct mm_struct *mm;
};
int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
@@ -53,7 +54,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
mutex_lock(&context->db_page_mutex);
list_for_each_entry(page, &context->db_page_list, list)
- if (page->user_virt == (virt & PAGE_MASK))
+ if ((current->mm == page->mm) &&
+ (page->user_virt == (virt & PAGE_MASK)))
goto found;
page = kmalloc(sizeof(*page), GFP_KERNEL);
@@ -71,6 +73,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
kfree(page);
goto out;
}
+ mmgrab(current->mm);
+ page->mm = current->mm;
list_add(&page->list, &context->db_page_list);
@@ -91,6 +95,7 @@ void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db)
if (!--db->u.user_page->refcnt) {
list_del(&db->u.user_page->list);
+ mmdrop(db->u.user_page->mm);
ib_umem_release(db->u.user_page->umem);
kfree(db->u.user_page);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0ffb4c12f7fa89163e228e6f27df09b46631db1 Mon Sep 17 00:00:00 2001
From: Mark Zhang <markzhang(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:18:03 +0300
Subject: [PATCH] RDMA/mlx5: Use different doorbell memory for different
processes
In a fork scenario, the parent and child can have same virtual address and
also share the uverbs fd. That causes to the list_for_each_entry search
return same doorbell physical page for all processes, even though that
page has been COW' or copied.
This patch takes the mm_struct into consideration during search, to make
sure that VA's belonging to different processes are not intermixed.
Resolves the malfunction of uverbs after fork in some specific cases.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/feacc23fe0bc6e1088c6824d5583798745e72405.16227262…
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Mark Zhang <markzhang(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index 61475b571531..7af4df7a6823 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -41,6 +41,7 @@ struct mlx5_ib_user_db_page {
struct ib_umem *umem;
unsigned long user_virt;
int refcnt;
+ struct mm_struct *mm;
};
int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
@@ -53,7 +54,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
mutex_lock(&context->db_page_mutex);
list_for_each_entry(page, &context->db_page_list, list)
- if (page->user_virt == (virt & PAGE_MASK))
+ if ((current->mm == page->mm) &&
+ (page->user_virt == (virt & PAGE_MASK)))
goto found;
page = kmalloc(sizeof(*page), GFP_KERNEL);
@@ -71,6 +73,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
kfree(page);
goto out;
}
+ mmgrab(current->mm);
+ page->mm = current->mm;
list_add(&page->list, &context->db_page_list);
@@ -91,6 +95,7 @@ void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db)
if (!--db->u.user_page->refcnt) {
list_del(&db->u.user_page->list);
+ mmdrop(db->u.user_page->mm);
ib_umem_release(db->u.user_page->umem);
kfree(db->u.user_page);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0ffb4c12f7fa89163e228e6f27df09b46631db1 Mon Sep 17 00:00:00 2001
From: Mark Zhang <markzhang(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:18:03 +0300
Subject: [PATCH] RDMA/mlx5: Use different doorbell memory for different
processes
In a fork scenario, the parent and child can have same virtual address and
also share the uverbs fd. That causes to the list_for_each_entry search
return same doorbell physical page for all processes, even though that
page has been COW' or copied.
This patch takes the mm_struct into consideration during search, to make
sure that VA's belonging to different processes are not intermixed.
Resolves the malfunction of uverbs after fork in some specific cases.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/feacc23fe0bc6e1088c6824d5583798745e72405.16227262…
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Mark Zhang <markzhang(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index 61475b571531..7af4df7a6823 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -41,6 +41,7 @@ struct mlx5_ib_user_db_page {
struct ib_umem *umem;
unsigned long user_virt;
int refcnt;
+ struct mm_struct *mm;
};
int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
@@ -53,7 +54,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
mutex_lock(&context->db_page_mutex);
list_for_each_entry(page, &context->db_page_list, list)
- if (page->user_virt == (virt & PAGE_MASK))
+ if ((current->mm == page->mm) &&
+ (page->user_virt == (virt & PAGE_MASK)))
goto found;
page = kmalloc(sizeof(*page), GFP_KERNEL);
@@ -71,6 +73,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
kfree(page);
goto out;
}
+ mmgrab(current->mm);
+ page->mm = current->mm;
list_add(&page->list, &context->db_page_list);
@@ -91,6 +95,7 @@ void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db)
if (!--db->u.user_page->refcnt) {
list_del(&db->u.user_page->list);
+ mmdrop(db->u.user_page->mm);
ib_umem_release(db->u.user_page->umem);
kfree(db->u.user_page);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0ffb4c12f7fa89163e228e6f27df09b46631db1 Mon Sep 17 00:00:00 2001
From: Mark Zhang <markzhang(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:18:03 +0300
Subject: [PATCH] RDMA/mlx5: Use different doorbell memory for different
processes
In a fork scenario, the parent and child can have same virtual address and
also share the uverbs fd. That causes to the list_for_each_entry search
return same doorbell physical page for all processes, even though that
page has been COW' or copied.
This patch takes the mm_struct into consideration during search, to make
sure that VA's belonging to different processes are not intermixed.
Resolves the malfunction of uverbs after fork in some specific cases.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/feacc23fe0bc6e1088c6824d5583798745e72405.16227262…
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Mark Zhang <markzhang(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index 61475b571531..7af4df7a6823 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -41,6 +41,7 @@ struct mlx5_ib_user_db_page {
struct ib_umem *umem;
unsigned long user_virt;
int refcnt;
+ struct mm_struct *mm;
};
int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
@@ -53,7 +54,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
mutex_lock(&context->db_page_mutex);
list_for_each_entry(page, &context->db_page_list, list)
- if (page->user_virt == (virt & PAGE_MASK))
+ if ((current->mm == page->mm) &&
+ (page->user_virt == (virt & PAGE_MASK)))
goto found;
page = kmalloc(sizeof(*page), GFP_KERNEL);
@@ -71,6 +73,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
kfree(page);
goto out;
}
+ mmgrab(current->mm);
+ page->mm = current->mm;
list_add(&page->list, &context->db_page_list);
@@ -91,6 +95,7 @@ void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db)
if (!--db->u.user_page->refcnt) {
list_del(&db->u.user_page->list);
+ mmdrop(db->u.user_page->mm);
ib_umem_release(db->u.user_page->umem);
kfree(db->u.user_page);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0ffb4c12f7fa89163e228e6f27df09b46631db1 Mon Sep 17 00:00:00 2001
From: Mark Zhang <markzhang(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:18:03 +0300
Subject: [PATCH] RDMA/mlx5: Use different doorbell memory for different
processes
In a fork scenario, the parent and child can have same virtual address and
also share the uverbs fd. That causes to the list_for_each_entry search
return same doorbell physical page for all processes, even though that
page has been COW' or copied.
This patch takes the mm_struct into consideration during search, to make
sure that VA's belonging to different processes are not intermixed.
Resolves the malfunction of uverbs after fork in some specific cases.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/feacc23fe0bc6e1088c6824d5583798745e72405.16227262…
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Mark Zhang <markzhang(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index 61475b571531..7af4df7a6823 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -41,6 +41,7 @@ struct mlx5_ib_user_db_page {
struct ib_umem *umem;
unsigned long user_virt;
int refcnt;
+ struct mm_struct *mm;
};
int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
@@ -53,7 +54,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
mutex_lock(&context->db_page_mutex);
list_for_each_entry(page, &context->db_page_list, list)
- if (page->user_virt == (virt & PAGE_MASK))
+ if ((current->mm == page->mm) &&
+ (page->user_virt == (virt & PAGE_MASK)))
goto found;
page = kmalloc(sizeof(*page), GFP_KERNEL);
@@ -71,6 +73,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
kfree(page);
goto out;
}
+ mmgrab(current->mm);
+ page->mm = current->mm;
list_add(&page->list, &context->db_page_list);
@@ -91,6 +95,7 @@ void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db)
if (!--db->u.user_page->refcnt) {
list_del(&db->u.user_page->list);
+ mmdrop(db->u.user_page->mm);
ib_umem_release(db->u.user_page->umem);
kfree(db->u.user_page);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0ffb4c12f7fa89163e228e6f27df09b46631db1 Mon Sep 17 00:00:00 2001
From: Mark Zhang <markzhang(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:18:03 +0300
Subject: [PATCH] RDMA/mlx5: Use different doorbell memory for different
processes
In a fork scenario, the parent and child can have same virtual address and
also share the uverbs fd. That causes to the list_for_each_entry search
return same doorbell physical page for all processes, even though that
page has been COW' or copied.
This patch takes the mm_struct into consideration during search, to make
sure that VA's belonging to different processes are not intermixed.
Resolves the malfunction of uverbs after fork in some specific cases.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/feacc23fe0bc6e1088c6824d5583798745e72405.16227262…
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Mark Zhang <markzhang(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c
index 61475b571531..7af4df7a6823 100644
--- a/drivers/infiniband/hw/mlx5/doorbell.c
+++ b/drivers/infiniband/hw/mlx5/doorbell.c
@@ -41,6 +41,7 @@ struct mlx5_ib_user_db_page {
struct ib_umem *umem;
unsigned long user_virt;
int refcnt;
+ struct mm_struct *mm;
};
int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
@@ -53,7 +54,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
mutex_lock(&context->db_page_mutex);
list_for_each_entry(page, &context->db_page_list, list)
- if (page->user_virt == (virt & PAGE_MASK))
+ if ((current->mm == page->mm) &&
+ (page->user_virt == (virt & PAGE_MASK)))
goto found;
page = kmalloc(sizeof(*page), GFP_KERNEL);
@@ -71,6 +73,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context,
kfree(page);
goto out;
}
+ mmgrab(current->mm);
+ page->mm = current->mm;
list_add(&page->list, &context->db_page_list);
@@ -91,6 +95,7 @@ void mlx5_ib_db_unmap_user(struct mlx5_ib_ucontext *context, struct mlx5_db *db)
if (!--db->u.user_page->refcnt) {
list_del(&db->u.user_page->list);
+ mmdrop(db->u.user_page->mm);
ib_umem_release(db->u.user_page->umem);
kfree(db->u.user_page);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7c7ad626d9a0ff0a36c1e2a3cfbbc6a13828d5eb Mon Sep 17 00:00:00 2001
From: Vincent Guittot <vincent.guittot(a)linaro.org>
Date: Thu, 27 May 2021 14:29:15 +0200
Subject: [PATCH] sched/fair: Keep load_avg and load_sum synced
when removing a cfs_rq from the list we only check _sum value so we must
ensure that _avg and _sum stay synced so load_sum can't be null whereas
load_avg is not after propagating load in the cgroup hierarchy.
Use load_avg to compute load_sum similarly to what is done for util_sum
and runnable_sum.
Fixes: 0e2d2aaaae52 ("sched/fair: Rewrite PELT migration propagation")
Reported-by: Odin Ugedal <odin(a)uged.al>
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Odin Ugedal <odin(a)uged.al>
Link: https://lkml.kernel.org/r/20210527122916.27683-2-vincent.guittot@linaro.org
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3248e24a90b0..f4795b800841 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3499,10 +3499,9 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
static inline void
update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
- long delta_avg, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
+ long delta, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
unsigned long load_avg;
u64 load_sum = 0;
- s64 delta_sum;
u32 divider;
if (!runnable_sum)
@@ -3549,13 +3548,13 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
load_sum = (s64)se_weight(se) * runnable_sum;
load_avg = div_s64(load_sum, divider);
- delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
- delta_avg = load_avg - se->avg.load_avg;
+ delta = load_avg - se->avg.load_avg;
se->avg.load_sum = runnable_sum;
se->avg.load_avg = load_avg;
- add_positive(&cfs_rq->avg.load_avg, delta_avg);
- add_positive(&cfs_rq->avg.load_sum, delta_sum);
+
+ add_positive(&cfs_rq->avg.load_avg, delta);
+ cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
}
static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7c7ad626d9a0ff0a36c1e2a3cfbbc6a13828d5eb Mon Sep 17 00:00:00 2001
From: Vincent Guittot <vincent.guittot(a)linaro.org>
Date: Thu, 27 May 2021 14:29:15 +0200
Subject: [PATCH] sched/fair: Keep load_avg and load_sum synced
when removing a cfs_rq from the list we only check _sum value so we must
ensure that _avg and _sum stay synced so load_sum can't be null whereas
load_avg is not after propagating load in the cgroup hierarchy.
Use load_avg to compute load_sum similarly to what is done for util_sum
and runnable_sum.
Fixes: 0e2d2aaaae52 ("sched/fair: Rewrite PELT migration propagation")
Reported-by: Odin Ugedal <odin(a)uged.al>
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Odin Ugedal <odin(a)uged.al>
Link: https://lkml.kernel.org/r/20210527122916.27683-2-vincent.guittot@linaro.org
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3248e24a90b0..f4795b800841 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3499,10 +3499,9 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
static inline void
update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
- long delta_avg, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
+ long delta, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
unsigned long load_avg;
u64 load_sum = 0;
- s64 delta_sum;
u32 divider;
if (!runnable_sum)
@@ -3549,13 +3548,13 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
load_sum = (s64)se_weight(se) * runnable_sum;
load_avg = div_s64(load_sum, divider);
- delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
- delta_avg = load_avg - se->avg.load_avg;
+ delta = load_avg - se->avg.load_avg;
se->avg.load_sum = runnable_sum;
se->avg.load_avg = load_avg;
- add_positive(&cfs_rq->avg.load_avg, delta_avg);
- add_positive(&cfs_rq->avg.load_sum, delta_sum);
+
+ add_positive(&cfs_rq->avg.load_avg, delta);
+ cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
}
static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 404e5a12691fe797486475fe28cc0b80cb8bef2c Mon Sep 17 00:00:00 2001
From: Shay Drory <shayd(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:19:39 +0300
Subject: [PATCH] RDMA/mlx4: Do not map the core_clock page to user space
unless enabled
Currently when mlx4 maps the hca_core_clock page to the user space there
are read-modifiable registers, one of which is semaphore, on this page as
well as the clock counter. If user reads the wrong offset, it can modify
the semaphore and hang the device.
Do not map the hca_core_clock page to the user space unless the device has
been put in a backwards compatibility mode to support this feature.
After this patch, mlx4 core_clock won't be mapped to user space on the
majority of existing devices and the uverbs device time feature in
ibv_query_rt_values_ex() will be disabled.
Fixes: 52033cfb5aab ("IB/mlx4: Add mmap call to map the hardware clock")
Link: https://lore.kernel.org/r/9632304e0d6790af84b3b706d8c18732bc0d5e27.16227263…
Signed-off-by: Shay Drory <shayd(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 22898d97ecbd..16704262fc3a 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -581,12 +581,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props->cq_caps.max_cq_moderation_count = MLX4_MAX_CQ_COUNT;
props->cq_caps.max_cq_moderation_period = MLX4_MAX_CQ_PERIOD;
- if (!mlx4_is_slave(dev->dev))
- err = mlx4_get_internal_clock_params(dev->dev, &clock_params);
-
if (uhw->outlen >= resp.response_length + sizeof(resp.hca_core_clock_offset)) {
resp.response_length += sizeof(resp.hca_core_clock_offset);
- if (!err && !mlx4_is_slave(dev->dev)) {
+ if (!mlx4_get_internal_clock_params(dev->dev, &clock_params)) {
resp.comp_mask |= MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET;
resp.hca_core_clock_offset = clock_params.offset % PAGE_SIZE;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index f6cfec81ccc3..dc4ac1a2b6b6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -823,6 +823,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
#define QUERY_DEV_CAP_MAD_DEMUX_OFFSET 0xb0
#define QUERY_DEV_CAP_DMFS_HIGH_RATE_QPN_BASE_OFFSET 0xa8
#define QUERY_DEV_CAP_DMFS_HIGH_RATE_QPN_RANGE_OFFSET 0xac
+#define QUERY_DEV_CAP_MAP_CLOCK_TO_USER 0xc1
#define QUERY_DEV_CAP_QP_RATE_LIMIT_NUM_OFFSET 0xcc
#define QUERY_DEV_CAP_QP_RATE_LIMIT_MAX_OFFSET 0xd0
#define QUERY_DEV_CAP_QP_RATE_LIMIT_MIN_OFFSET 0xd2
@@ -841,6 +842,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
if (mlx4_is_mfunc(dev))
disable_unsupported_roce_caps(outbox);
+ MLX4_GET(field, outbox, QUERY_DEV_CAP_MAP_CLOCK_TO_USER);
+ dev_cap->map_clock_to_user = field & 0x80;
MLX4_GET(field, outbox, QUERY_DEV_CAP_RSVD_QP_OFFSET);
dev_cap->reserved_qps = 1 << (field & 0xf);
MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_QP_OFFSET);
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h b/drivers/net/ethernet/mellanox/mlx4/fw.h
index 8f020f26ebf5..cf64e54eecb0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -131,6 +131,7 @@ struct mlx4_dev_cap {
u32 health_buffer_addrs;
struct mlx4_port_cap port_cap[MLX4_MAX_PORTS + 1];
bool wol_port[MLX4_MAX_PORTS + 1];
+ bool map_clock_to_user;
};
struct mlx4_func_cap {
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index c326b434734e..00c84656b2e7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -498,6 +498,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
}
}
+ dev->caps.map_clock_to_user = dev_cap->map_clock_to_user;
dev->caps.uar_page_size = PAGE_SIZE;
dev->caps.num_uars = dev_cap->uar_size / PAGE_SIZE;
dev->caps.local_ca_ack_delay = dev_cap->local_ca_ack_delay;
@@ -1948,6 +1949,11 @@ int mlx4_get_internal_clock_params(struct mlx4_dev *dev,
if (mlx4_is_slave(dev))
return -EOPNOTSUPP;
+ if (!dev->caps.map_clock_to_user) {
+ mlx4_dbg(dev, "Map clock to user is not supported.\n");
+ return -EOPNOTSUPP;
+ }
+
if (!params)
return -EINVAL;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 236a7d04f891..30bb59fe970c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -630,6 +630,7 @@ struct mlx4_caps {
bool wol_port[MLX4_MAX_PORTS + 1];
struct mlx4_rate_limit_caps rl_caps;
u32 health_buffer_addrs;
+ bool map_clock_to_user;
};
struct mlx4_buf_list {
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 404e5a12691fe797486475fe28cc0b80cb8bef2c Mon Sep 17 00:00:00 2001
From: Shay Drory <shayd(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:19:39 +0300
Subject: [PATCH] RDMA/mlx4: Do not map the core_clock page to user space
unless enabled
Currently when mlx4 maps the hca_core_clock page to the user space there
are read-modifiable registers, one of which is semaphore, on this page as
well as the clock counter. If user reads the wrong offset, it can modify
the semaphore and hang the device.
Do not map the hca_core_clock page to the user space unless the device has
been put in a backwards compatibility mode to support this feature.
After this patch, mlx4 core_clock won't be mapped to user space on the
majority of existing devices and the uverbs device time feature in
ibv_query_rt_values_ex() will be disabled.
Fixes: 52033cfb5aab ("IB/mlx4: Add mmap call to map the hardware clock")
Link: https://lore.kernel.org/r/9632304e0d6790af84b3b706d8c18732bc0d5e27.16227263…
Signed-off-by: Shay Drory <shayd(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 22898d97ecbd..16704262fc3a 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -581,12 +581,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props->cq_caps.max_cq_moderation_count = MLX4_MAX_CQ_COUNT;
props->cq_caps.max_cq_moderation_period = MLX4_MAX_CQ_PERIOD;
- if (!mlx4_is_slave(dev->dev))
- err = mlx4_get_internal_clock_params(dev->dev, &clock_params);
-
if (uhw->outlen >= resp.response_length + sizeof(resp.hca_core_clock_offset)) {
resp.response_length += sizeof(resp.hca_core_clock_offset);
- if (!err && !mlx4_is_slave(dev->dev)) {
+ if (!mlx4_get_internal_clock_params(dev->dev, &clock_params)) {
resp.comp_mask |= MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET;
resp.hca_core_clock_offset = clock_params.offset % PAGE_SIZE;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index f6cfec81ccc3..dc4ac1a2b6b6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -823,6 +823,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
#define QUERY_DEV_CAP_MAD_DEMUX_OFFSET 0xb0
#define QUERY_DEV_CAP_DMFS_HIGH_RATE_QPN_BASE_OFFSET 0xa8
#define QUERY_DEV_CAP_DMFS_HIGH_RATE_QPN_RANGE_OFFSET 0xac
+#define QUERY_DEV_CAP_MAP_CLOCK_TO_USER 0xc1
#define QUERY_DEV_CAP_QP_RATE_LIMIT_NUM_OFFSET 0xcc
#define QUERY_DEV_CAP_QP_RATE_LIMIT_MAX_OFFSET 0xd0
#define QUERY_DEV_CAP_QP_RATE_LIMIT_MIN_OFFSET 0xd2
@@ -841,6 +842,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
if (mlx4_is_mfunc(dev))
disable_unsupported_roce_caps(outbox);
+ MLX4_GET(field, outbox, QUERY_DEV_CAP_MAP_CLOCK_TO_USER);
+ dev_cap->map_clock_to_user = field & 0x80;
MLX4_GET(field, outbox, QUERY_DEV_CAP_RSVD_QP_OFFSET);
dev_cap->reserved_qps = 1 << (field & 0xf);
MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_QP_OFFSET);
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h b/drivers/net/ethernet/mellanox/mlx4/fw.h
index 8f020f26ebf5..cf64e54eecb0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -131,6 +131,7 @@ struct mlx4_dev_cap {
u32 health_buffer_addrs;
struct mlx4_port_cap port_cap[MLX4_MAX_PORTS + 1];
bool wol_port[MLX4_MAX_PORTS + 1];
+ bool map_clock_to_user;
};
struct mlx4_func_cap {
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index c326b434734e..00c84656b2e7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -498,6 +498,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
}
}
+ dev->caps.map_clock_to_user = dev_cap->map_clock_to_user;
dev->caps.uar_page_size = PAGE_SIZE;
dev->caps.num_uars = dev_cap->uar_size / PAGE_SIZE;
dev->caps.local_ca_ack_delay = dev_cap->local_ca_ack_delay;
@@ -1948,6 +1949,11 @@ int mlx4_get_internal_clock_params(struct mlx4_dev *dev,
if (mlx4_is_slave(dev))
return -EOPNOTSUPP;
+ if (!dev->caps.map_clock_to_user) {
+ mlx4_dbg(dev, "Map clock to user is not supported.\n");
+ return -EOPNOTSUPP;
+ }
+
if (!params)
return -EINVAL;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 236a7d04f891..30bb59fe970c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -630,6 +630,7 @@ struct mlx4_caps {
bool wol_port[MLX4_MAX_PORTS + 1];
struct mlx4_rate_limit_caps rl_caps;
u32 health_buffer_addrs;
+ bool map_clock_to_user;
};
struct mlx4_buf_list {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 404e5a12691fe797486475fe28cc0b80cb8bef2c Mon Sep 17 00:00:00 2001
From: Shay Drory <shayd(a)nvidia.com>
Date: Thu, 3 Jun 2021 16:19:39 +0300
Subject: [PATCH] RDMA/mlx4: Do not map the core_clock page to user space
unless enabled
Currently when mlx4 maps the hca_core_clock page to the user space there
are read-modifiable registers, one of which is semaphore, on this page as
well as the clock counter. If user reads the wrong offset, it can modify
the semaphore and hang the device.
Do not map the hca_core_clock page to the user space unless the device has
been put in a backwards compatibility mode to support this feature.
After this patch, mlx4 core_clock won't be mapped to user space on the
majority of existing devices and the uverbs device time feature in
ibv_query_rt_values_ex() will be disabled.
Fixes: 52033cfb5aab ("IB/mlx4: Add mmap call to map the hardware clock")
Link: https://lore.kernel.org/r/9632304e0d6790af84b3b706d8c18732bc0d5e27.16227263…
Signed-off-by: Shay Drory <shayd(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 22898d97ecbd..16704262fc3a 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -581,12 +581,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props->cq_caps.max_cq_moderation_count = MLX4_MAX_CQ_COUNT;
props->cq_caps.max_cq_moderation_period = MLX4_MAX_CQ_PERIOD;
- if (!mlx4_is_slave(dev->dev))
- err = mlx4_get_internal_clock_params(dev->dev, &clock_params);
-
if (uhw->outlen >= resp.response_length + sizeof(resp.hca_core_clock_offset)) {
resp.response_length += sizeof(resp.hca_core_clock_offset);
- if (!err && !mlx4_is_slave(dev->dev)) {
+ if (!mlx4_get_internal_clock_params(dev->dev, &clock_params)) {
resp.comp_mask |= MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET;
resp.hca_core_clock_offset = clock_params.offset % PAGE_SIZE;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index f6cfec81ccc3..dc4ac1a2b6b6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -823,6 +823,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
#define QUERY_DEV_CAP_MAD_DEMUX_OFFSET 0xb0
#define QUERY_DEV_CAP_DMFS_HIGH_RATE_QPN_BASE_OFFSET 0xa8
#define QUERY_DEV_CAP_DMFS_HIGH_RATE_QPN_RANGE_OFFSET 0xac
+#define QUERY_DEV_CAP_MAP_CLOCK_TO_USER 0xc1
#define QUERY_DEV_CAP_QP_RATE_LIMIT_NUM_OFFSET 0xcc
#define QUERY_DEV_CAP_QP_RATE_LIMIT_MAX_OFFSET 0xd0
#define QUERY_DEV_CAP_QP_RATE_LIMIT_MIN_OFFSET 0xd2
@@ -841,6 +842,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
if (mlx4_is_mfunc(dev))
disable_unsupported_roce_caps(outbox);
+ MLX4_GET(field, outbox, QUERY_DEV_CAP_MAP_CLOCK_TO_USER);
+ dev_cap->map_clock_to_user = field & 0x80;
MLX4_GET(field, outbox, QUERY_DEV_CAP_RSVD_QP_OFFSET);
dev_cap->reserved_qps = 1 << (field & 0xf);
MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_QP_OFFSET);
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h b/drivers/net/ethernet/mellanox/mlx4/fw.h
index 8f020f26ebf5..cf64e54eecb0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -131,6 +131,7 @@ struct mlx4_dev_cap {
u32 health_buffer_addrs;
struct mlx4_port_cap port_cap[MLX4_MAX_PORTS + 1];
bool wol_port[MLX4_MAX_PORTS + 1];
+ bool map_clock_to_user;
};
struct mlx4_func_cap {
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index c326b434734e..00c84656b2e7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -498,6 +498,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
}
}
+ dev->caps.map_clock_to_user = dev_cap->map_clock_to_user;
dev->caps.uar_page_size = PAGE_SIZE;
dev->caps.num_uars = dev_cap->uar_size / PAGE_SIZE;
dev->caps.local_ca_ack_delay = dev_cap->local_ca_ack_delay;
@@ -1948,6 +1949,11 @@ int mlx4_get_internal_clock_params(struct mlx4_dev *dev,
if (mlx4_is_slave(dev))
return -EOPNOTSUPP;
+ if (!dev->caps.map_clock_to_user) {
+ mlx4_dbg(dev, "Map clock to user is not supported.\n");
+ return -EOPNOTSUPP;
+ }
+
if (!params)
return -EINVAL;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 236a7d04f891..30bb59fe970c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -630,6 +630,7 @@ struct mlx4_caps {
bool wol_port[MLX4_MAX_PORTS + 1];
struct mlx4_rate_limit_caps rl_caps;
u32 health_buffer_addrs;
+ bool map_clock_to_user;
};
struct mlx4_buf_list {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f41bfc7e9c7c1d721c8752f1853cde43e606ad43 Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Tue, 1 Jun 2021 20:31:48 +0800
Subject: [PATCH] usb: typec: tcpm: Correct the responses in SVDM Version 2.0
DFP
In USB PD Spec Rev 3.1 Ver 1.0, section "6.12.5 Applicability of
Structured VDM Commands", DFP is allowed and recommended to respond to
Discovery Identity with ACK. And in section "6.4.4.2.5.1 Commands other
than Attention", NAK should be returned only when receiving Messages
with invalid fields, Messages in wrong situation, or unrecognize
Messages.
Still keep the original design for SVDM Version 1.0 for backward
compatibilities.
Fixes: 193a68011fdc ("staging: typec: tcpm: Respond to Discover Identity commands")
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Link: https://lore.kernel.org/r/20210601123151.3441914-2-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 9ce8c9af4da5..a1bf0dc5babf 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -1547,19 +1547,25 @@ static int tcpm_pd_svdm(struct tcpm_port *port, struct typec_altmode *adev,
if (PD_VDO_VID(p[0]) != USB_SID_PD)
break;
- if (PD_VDO_SVDM_VER(p[0]) < svdm_version)
+ if (PD_VDO_SVDM_VER(p[0]) < svdm_version) {
typec_partner_set_svdm_version(port->partner,
PD_VDO_SVDM_VER(p[0]));
+ svdm_version = PD_VDO_SVDM_VER(p[0]);
+ }
tcpm_ams_start(port, DISCOVER_IDENTITY);
- /* 6.4.4.3.1: Only respond as UFP (device) */
- if (port->data_role == TYPEC_DEVICE &&
+ /*
+ * PD2.0 Spec 6.10.3: respond with NAK as DFP (data host)
+ * PD3.1 Spec 6.4.4.2.5.1: respond with NAK if "invalid field" or
+ * "wrong configuation" or "Unrecognized"
+ */
+ if ((port->data_role == TYPEC_DEVICE || svdm_version >= SVDM_VER_2_0) &&
port->nr_snk_vdo) {
/*
* Product Type DFP and Connector Type are not defined in SVDM
* version 1.0 and shall be set to zero.
*/
- if (typec_get_negotiated_svdm_version(typec) < SVDM_VER_2_0)
+ if (svdm_version < SVDM_VER_2_0)
response[1] = port->snk_vdo[0] & ~IDH_DFP_MASK
& ~IDH_CONN_MASK;
else
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f41bfc7e9c7c1d721c8752f1853cde43e606ad43 Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Tue, 1 Jun 2021 20:31:48 +0800
Subject: [PATCH] usb: typec: tcpm: Correct the responses in SVDM Version 2.0
DFP
In USB PD Spec Rev 3.1 Ver 1.0, section "6.12.5 Applicability of
Structured VDM Commands", DFP is allowed and recommended to respond to
Discovery Identity with ACK. And in section "6.4.4.2.5.1 Commands other
than Attention", NAK should be returned only when receiving Messages
with invalid fields, Messages in wrong situation, or unrecognize
Messages.
Still keep the original design for SVDM Version 1.0 for backward
compatibilities.
Fixes: 193a68011fdc ("staging: typec: tcpm: Respond to Discover Identity commands")
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Link: https://lore.kernel.org/r/20210601123151.3441914-2-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 9ce8c9af4da5..a1bf0dc5babf 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -1547,19 +1547,25 @@ static int tcpm_pd_svdm(struct tcpm_port *port, struct typec_altmode *adev,
if (PD_VDO_VID(p[0]) != USB_SID_PD)
break;
- if (PD_VDO_SVDM_VER(p[0]) < svdm_version)
+ if (PD_VDO_SVDM_VER(p[0]) < svdm_version) {
typec_partner_set_svdm_version(port->partner,
PD_VDO_SVDM_VER(p[0]));
+ svdm_version = PD_VDO_SVDM_VER(p[0]);
+ }
tcpm_ams_start(port, DISCOVER_IDENTITY);
- /* 6.4.4.3.1: Only respond as UFP (device) */
- if (port->data_role == TYPEC_DEVICE &&
+ /*
+ * PD2.0 Spec 6.10.3: respond with NAK as DFP (data host)
+ * PD3.1 Spec 6.4.4.2.5.1: respond with NAK if "invalid field" or
+ * "wrong configuation" or "Unrecognized"
+ */
+ if ((port->data_role == TYPEC_DEVICE || svdm_version >= SVDM_VER_2_0) &&
port->nr_snk_vdo) {
/*
* Product Type DFP and Connector Type are not defined in SVDM
* version 1.0 and shall be set to zero.
*/
- if (typec_get_negotiated_svdm_version(typec) < SVDM_VER_2_0)
+ if (svdm_version < SVDM_VER_2_0)
response[1] = port->snk_vdo[0] & ~IDH_DFP_MASK
& ~IDH_CONN_MASK;
else
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f41bfc7e9c7c1d721c8752f1853cde43e606ad43 Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Tue, 1 Jun 2021 20:31:48 +0800
Subject: [PATCH] usb: typec: tcpm: Correct the responses in SVDM Version 2.0
DFP
In USB PD Spec Rev 3.1 Ver 1.0, section "6.12.5 Applicability of
Structured VDM Commands", DFP is allowed and recommended to respond to
Discovery Identity with ACK. And in section "6.4.4.2.5.1 Commands other
than Attention", NAK should be returned only when receiving Messages
with invalid fields, Messages in wrong situation, or unrecognize
Messages.
Still keep the original design for SVDM Version 1.0 for backward
compatibilities.
Fixes: 193a68011fdc ("staging: typec: tcpm: Respond to Discover Identity commands")
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Link: https://lore.kernel.org/r/20210601123151.3441914-2-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 9ce8c9af4da5..a1bf0dc5babf 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -1547,19 +1547,25 @@ static int tcpm_pd_svdm(struct tcpm_port *port, struct typec_altmode *adev,
if (PD_VDO_VID(p[0]) != USB_SID_PD)
break;
- if (PD_VDO_SVDM_VER(p[0]) < svdm_version)
+ if (PD_VDO_SVDM_VER(p[0]) < svdm_version) {
typec_partner_set_svdm_version(port->partner,
PD_VDO_SVDM_VER(p[0]));
+ svdm_version = PD_VDO_SVDM_VER(p[0]);
+ }
tcpm_ams_start(port, DISCOVER_IDENTITY);
- /* 6.4.4.3.1: Only respond as UFP (device) */
- if (port->data_role == TYPEC_DEVICE &&
+ /*
+ * PD2.0 Spec 6.10.3: respond with NAK as DFP (data host)
+ * PD3.1 Spec 6.4.4.2.5.1: respond with NAK if "invalid field" or
+ * "wrong configuation" or "Unrecognized"
+ */
+ if ((port->data_role == TYPEC_DEVICE || svdm_version >= SVDM_VER_2_0) &&
port->nr_snk_vdo) {
/*
* Product Type DFP and Connector Type are not defined in SVDM
* version 1.0 and shall be set to zero.
*/
- if (typec_get_negotiated_svdm_version(typec) < SVDM_VER_2_0)
+ if (svdm_version < SVDM_VER_2_0)
response[1] = port->snk_vdo[0] & ~IDH_DFP_MASK
& ~IDH_CONN_MASK;
else
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f41bfc7e9c7c1d721c8752f1853cde43e606ad43 Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Tue, 1 Jun 2021 20:31:48 +0800
Subject: [PATCH] usb: typec: tcpm: Correct the responses in SVDM Version 2.0
DFP
In USB PD Spec Rev 3.1 Ver 1.0, section "6.12.5 Applicability of
Structured VDM Commands", DFP is allowed and recommended to respond to
Discovery Identity with ACK. And in section "6.4.4.2.5.1 Commands other
than Attention", NAK should be returned only when receiving Messages
with invalid fields, Messages in wrong situation, or unrecognize
Messages.
Still keep the original design for SVDM Version 1.0 for backward
compatibilities.
Fixes: 193a68011fdc ("staging: typec: tcpm: Respond to Discover Identity commands")
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Link: https://lore.kernel.org/r/20210601123151.3441914-2-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 9ce8c9af4da5..a1bf0dc5babf 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -1547,19 +1547,25 @@ static int tcpm_pd_svdm(struct tcpm_port *port, struct typec_altmode *adev,
if (PD_VDO_VID(p[0]) != USB_SID_PD)
break;
- if (PD_VDO_SVDM_VER(p[0]) < svdm_version)
+ if (PD_VDO_SVDM_VER(p[0]) < svdm_version) {
typec_partner_set_svdm_version(port->partner,
PD_VDO_SVDM_VER(p[0]));
+ svdm_version = PD_VDO_SVDM_VER(p[0]);
+ }
tcpm_ams_start(port, DISCOVER_IDENTITY);
- /* 6.4.4.3.1: Only respond as UFP (device) */
- if (port->data_role == TYPEC_DEVICE &&
+ /*
+ * PD2.0 Spec 6.10.3: respond with NAK as DFP (data host)
+ * PD3.1 Spec 6.4.4.2.5.1: respond with NAK if "invalid field" or
+ * "wrong configuation" or "Unrecognized"
+ */
+ if ((port->data_role == TYPEC_DEVICE || svdm_version >= SVDM_VER_2_0) &&
port->nr_snk_vdo) {
/*
* Product Type DFP and Connector Type are not defined in SVDM
* version 1.0 and shall be set to zero.
*/
- if (typec_get_negotiated_svdm_version(typec) < SVDM_VER_2_0)
+ if (svdm_version < SVDM_VER_2_0)
response[1] = port->snk_vdo[0] & ~IDH_DFP_MASK
& ~IDH_CONN_MASK;
else
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d6eef886903c4bb5af41b9a31d4ba11dc7a6f8e8 Mon Sep 17 00:00:00 2001
From: Sanket Parmar <sparmar(a)cadence.com>
Date: Mon, 17 May 2021 17:05:12 +0200
Subject: [PATCH] usb: cdns3: Enable TDL_CHK only for OUT ep
ZLP gets stuck if TDL_CHK bit is set and TDL_FROM_TRB is used
as TDL source for IN endpoints. To fix it, TDL_CHK is only
enabled for OUT endpoints.
Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver")
Reported-by: Aswath Govindraju <a-govindraju(a)ti.com>
Signed-off-by: Sanket Parmar <sparmar(a)cadence.com>
Link: https://lore.kernel.org/r/1621263912-13175-1-git-send-email-sparmar@cadence…
Signed-off-by: Peter Chen <peter.chen(a)kernel.org>
diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
index a8b7b50abf64..5281f8d3fb3d 100644
--- a/drivers/usb/cdns3/cdns3-gadget.c
+++ b/drivers/usb/cdns3/cdns3-gadget.c
@@ -2007,7 +2007,7 @@ static void cdns3_configure_dmult(struct cdns3_device *priv_dev,
else
mask = BIT(priv_ep->num);
- if (priv_ep->type != USB_ENDPOINT_XFER_ISOC) {
+ if (priv_ep->type != USB_ENDPOINT_XFER_ISOC && !priv_ep->dir) {
cdns3_set_register_bit(®s->tdl_from_trb, mask);
cdns3_set_register_bit(®s->tdl_beh, mask);
cdns3_set_register_bit(®s->tdl_beh2, mask);
@@ -2046,15 +2046,13 @@ int cdns3_ep_config(struct cdns3_endpoint *priv_ep, bool enable)
case USB_ENDPOINT_XFER_INT:
ep_cfg = EP_CFG_EPTYPE(USB_ENDPOINT_XFER_INT);
- if ((priv_dev->dev_ver == DEV_VER_V2 && !priv_ep->dir) ||
- priv_dev->dev_ver > DEV_VER_V2)
+ if (priv_dev->dev_ver >= DEV_VER_V2 && !priv_ep->dir)
ep_cfg |= EP_CFG_TDL_CHK;
break;
case USB_ENDPOINT_XFER_BULK:
ep_cfg = EP_CFG_EPTYPE(USB_ENDPOINT_XFER_BULK);
- if ((priv_dev->dev_ver == DEV_VER_V2 && !priv_ep->dir) ||
- priv_dev->dev_ver > DEV_VER_V2)
+ if (priv_dev->dev_ver >= DEV_VER_V2 && !priv_ep->dir)
ep_cfg |= EP_CFG_TDL_CHK;
break;
default:
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d6eef886903c4bb5af41b9a31d4ba11dc7a6f8e8 Mon Sep 17 00:00:00 2001
From: Sanket Parmar <sparmar(a)cadence.com>
Date: Mon, 17 May 2021 17:05:12 +0200
Subject: [PATCH] usb: cdns3: Enable TDL_CHK only for OUT ep
ZLP gets stuck if TDL_CHK bit is set and TDL_FROM_TRB is used
as TDL source for IN endpoints. To fix it, TDL_CHK is only
enabled for OUT endpoints.
Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver")
Reported-by: Aswath Govindraju <a-govindraju(a)ti.com>
Signed-off-by: Sanket Parmar <sparmar(a)cadence.com>
Link: https://lore.kernel.org/r/1621263912-13175-1-git-send-email-sparmar@cadence…
Signed-off-by: Peter Chen <peter.chen(a)kernel.org>
diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
index a8b7b50abf64..5281f8d3fb3d 100644
--- a/drivers/usb/cdns3/cdns3-gadget.c
+++ b/drivers/usb/cdns3/cdns3-gadget.c
@@ -2007,7 +2007,7 @@ static void cdns3_configure_dmult(struct cdns3_device *priv_dev,
else
mask = BIT(priv_ep->num);
- if (priv_ep->type != USB_ENDPOINT_XFER_ISOC) {
+ if (priv_ep->type != USB_ENDPOINT_XFER_ISOC && !priv_ep->dir) {
cdns3_set_register_bit(®s->tdl_from_trb, mask);
cdns3_set_register_bit(®s->tdl_beh, mask);
cdns3_set_register_bit(®s->tdl_beh2, mask);
@@ -2046,15 +2046,13 @@ int cdns3_ep_config(struct cdns3_endpoint *priv_ep, bool enable)
case USB_ENDPOINT_XFER_INT:
ep_cfg = EP_CFG_EPTYPE(USB_ENDPOINT_XFER_INT);
- if ((priv_dev->dev_ver == DEV_VER_V2 && !priv_ep->dir) ||
- priv_dev->dev_ver > DEV_VER_V2)
+ if (priv_dev->dev_ver >= DEV_VER_V2 && !priv_ep->dir)
ep_cfg |= EP_CFG_TDL_CHK;
break;
case USB_ENDPOINT_XFER_BULK:
ep_cfg = EP_CFG_EPTYPE(USB_ENDPOINT_XFER_BULK);
- if ((priv_dev->dev_ver == DEV_VER_V2 && !priv_ep->dir) ||
- priv_dev->dev_ver > DEV_VER_V2)
+ if (priv_dev->dev_ver >= DEV_VER_V2 && !priv_ep->dir)
ep_cfg |= EP_CFG_TDL_CHK;
break;
default:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b65ba0c362be665192381cc59e3ac3ef6f0dd1e1 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Date: Fri, 28 May 2021 16:04:46 +0200
Subject: [PATCH] usb: musb: fix MUSB_QUIRK_B_DISCONNECT_99 handling
In commit 92af4fc6ec33 ("usb: musb: Fix suspend with devices
connected for a64"), the logic to support the
MUSB_QUIRK_B_DISCONNECT_99 quirk was modified to only conditionally
schedule the musb->irq_work delayed work.
This commit badly breaks ECM Gadget on AM335X. Indeed, with this
commit, one can observe massive packet loss:
$ ping 192.168.0.100
...
15 packets transmitted, 3 received, 80% packet loss, time 14316ms
Reverting this commit brings back a properly functioning ECM
Gadget. An analysis of the commit seems to indicate that a mistake was
made: the previous code was not falling through into the
MUSB_QUIRK_B_INVALID_VBUS_91, but now it is, unless the condition is
taken.
Changing the logic to be as it was before the problematic commit *and*
only conditionally scheduling musb->irq_work resolves the regression:
$ ping 192.168.0.100
...
64 packets transmitted, 64 received, 0% packet loss, time 64475ms
Fixes: 92af4fc6ec33 ("usb: musb: Fix suspend with devices connected for a64")
Cc: stable(a)vger.kernel.org
Tested-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Tested-by: Drew Fustini <drew(a)beagleboard.org>
Acked-by: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Link: https://lore.kernel.org/r/20210528140446.278076-1-thomas.petazzoni@bootlin.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
index 8f09a387b773..4c8f0112481f 100644
--- a/drivers/usb/musb/musb_core.c
+++ b/drivers/usb/musb/musb_core.c
@@ -2009,9 +2009,8 @@ static void musb_pm_runtime_check_session(struct musb *musb)
schedule_delayed_work(&musb->irq_work,
msecs_to_jiffies(1000));
musb->quirk_retries--;
- break;
}
- fallthrough;
+ break;
case MUSB_QUIRK_B_INVALID_VBUS_91:
if (musb->quirk_retries && !musb->flush_irq_work) {
musb_dbg(musb,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b65ba0c362be665192381cc59e3ac3ef6f0dd1e1 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Date: Fri, 28 May 2021 16:04:46 +0200
Subject: [PATCH] usb: musb: fix MUSB_QUIRK_B_DISCONNECT_99 handling
In commit 92af4fc6ec33 ("usb: musb: Fix suspend with devices
connected for a64"), the logic to support the
MUSB_QUIRK_B_DISCONNECT_99 quirk was modified to only conditionally
schedule the musb->irq_work delayed work.
This commit badly breaks ECM Gadget on AM335X. Indeed, with this
commit, one can observe massive packet loss:
$ ping 192.168.0.100
...
15 packets transmitted, 3 received, 80% packet loss, time 14316ms
Reverting this commit brings back a properly functioning ECM
Gadget. An analysis of the commit seems to indicate that a mistake was
made: the previous code was not falling through into the
MUSB_QUIRK_B_INVALID_VBUS_91, but now it is, unless the condition is
taken.
Changing the logic to be as it was before the problematic commit *and*
only conditionally scheduling musb->irq_work resolves the regression:
$ ping 192.168.0.100
...
64 packets transmitted, 64 received, 0% packet loss, time 64475ms
Fixes: 92af4fc6ec33 ("usb: musb: Fix suspend with devices connected for a64")
Cc: stable(a)vger.kernel.org
Tested-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Tested-by: Drew Fustini <drew(a)beagleboard.org>
Acked-by: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Link: https://lore.kernel.org/r/20210528140446.278076-1-thomas.petazzoni@bootlin.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
index 8f09a387b773..4c8f0112481f 100644
--- a/drivers/usb/musb/musb_core.c
+++ b/drivers/usb/musb/musb_core.c
@@ -2009,9 +2009,8 @@ static void musb_pm_runtime_check_session(struct musb *musb)
schedule_delayed_work(&musb->irq_work,
msecs_to_jiffies(1000));
musb->quirk_retries--;
- break;
}
- fallthrough;
+ break;
case MUSB_QUIRK_B_INVALID_VBUS_91:
if (musb->quirk_retries && !musb->flush_irq_work) {
musb_dbg(musb,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6c14133d2d3f768e0a35128faac8aa6ed4815051 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 7 Jun 2021 21:39:08 -0400
Subject: [PATCH] ftrace: Do not blindly read the ip address in ftrace_bug()
It was reported that a bug on arm64 caused a bad ip address to be used for
updating into a nop in ftrace_init(), but the error path (rightfully)
returned -EINVAL and not -EFAULT, as the bug caused more than one error to
occur. But because -EINVAL was returned, the ftrace_bug() tried to report
what was at the location of the ip address, and read it directly. This
caused the machine to panic, as the ip was not pointing to a valid memory
address.
Instead, read the ip address with copy_from_kernel_nofault() to safely
access the memory, and if it faults, report that the address faulted,
otherwise report what was in that location.
Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.c…
Cc: stable(a)vger.kernel.org
Fixes: 05736a427f7e1 ("ftrace: warn on failure to disable mcount callers")
Reported-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2e8a3fde7104..72ef4dccbcc4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1967,12 +1967,18 @@ static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
static void print_ip_ins(const char *fmt, const unsigned char *p)
{
+ char ins[MCOUNT_INSN_SIZE];
int i;
+ if (copy_from_kernel_nofault(ins, p, MCOUNT_INSN_SIZE)) {
+ printk(KERN_CONT "%s[FAULT] %px\n", fmt, p);
+ return;
+ }
+
printk(KERN_CONT "%s", fmt);
for (i = 0; i < MCOUNT_INSN_SIZE; i++)
- printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]);
+ printk(KERN_CONT "%s%02x", i ? ":" : "", ins[i]);
}
enum ftrace_bug_type ftrace_bug_type;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6c14133d2d3f768e0a35128faac8aa6ed4815051 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 7 Jun 2021 21:39:08 -0400
Subject: [PATCH] ftrace: Do not blindly read the ip address in ftrace_bug()
It was reported that a bug on arm64 caused a bad ip address to be used for
updating into a nop in ftrace_init(), but the error path (rightfully)
returned -EINVAL and not -EFAULT, as the bug caused more than one error to
occur. But because -EINVAL was returned, the ftrace_bug() tried to report
what was at the location of the ip address, and read it directly. This
caused the machine to panic, as the ip was not pointing to a valid memory
address.
Instead, read the ip address with copy_from_kernel_nofault() to safely
access the memory, and if it faults, report that the address faulted,
otherwise report what was in that location.
Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.c…
Cc: stable(a)vger.kernel.org
Fixes: 05736a427f7e1 ("ftrace: warn on failure to disable mcount callers")
Reported-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2e8a3fde7104..72ef4dccbcc4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1967,12 +1967,18 @@ static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
static void print_ip_ins(const char *fmt, const unsigned char *p)
{
+ char ins[MCOUNT_INSN_SIZE];
int i;
+ if (copy_from_kernel_nofault(ins, p, MCOUNT_INSN_SIZE)) {
+ printk(KERN_CONT "%s[FAULT] %px\n", fmt, p);
+ return;
+ }
+
printk(KERN_CONT "%s", fmt);
for (i = 0; i < MCOUNT_INSN_SIZE; i++)
- printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]);
+ printk(KERN_CONT "%s%02x", i ? ":" : "", ins[i]);
}
enum ftrace_bug_type ftrace_bug_type;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6c14133d2d3f768e0a35128faac8aa6ed4815051 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 7 Jun 2021 21:39:08 -0400
Subject: [PATCH] ftrace: Do not blindly read the ip address in ftrace_bug()
It was reported that a bug on arm64 caused a bad ip address to be used for
updating into a nop in ftrace_init(), but the error path (rightfully)
returned -EINVAL and not -EFAULT, as the bug caused more than one error to
occur. But because -EINVAL was returned, the ftrace_bug() tried to report
what was at the location of the ip address, and read it directly. This
caused the machine to panic, as the ip was not pointing to a valid memory
address.
Instead, read the ip address with copy_from_kernel_nofault() to safely
access the memory, and if it faults, report that the address faulted,
otherwise report what was in that location.
Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.c…
Cc: stable(a)vger.kernel.org
Fixes: 05736a427f7e1 ("ftrace: warn on failure to disable mcount callers")
Reported-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2e8a3fde7104..72ef4dccbcc4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1967,12 +1967,18 @@ static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
static void print_ip_ins(const char *fmt, const unsigned char *p)
{
+ char ins[MCOUNT_INSN_SIZE];
int i;
+ if (copy_from_kernel_nofault(ins, p, MCOUNT_INSN_SIZE)) {
+ printk(KERN_CONT "%s[FAULT] %px\n", fmt, p);
+ return;
+ }
+
printk(KERN_CONT "%s", fmt);
for (i = 0; i < MCOUNT_INSN_SIZE; i++)
- printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]);
+ printk(KERN_CONT "%s%02x", i ? ":" : "", ins[i]);
}
enum ftrace_bug_type ftrace_bug_type;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6c14133d2d3f768e0a35128faac8aa6ed4815051 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 7 Jun 2021 21:39:08 -0400
Subject: [PATCH] ftrace: Do not blindly read the ip address in ftrace_bug()
It was reported that a bug on arm64 caused a bad ip address to be used for
updating into a nop in ftrace_init(), but the error path (rightfully)
returned -EINVAL and not -EFAULT, as the bug caused more than one error to
occur. But because -EINVAL was returned, the ftrace_bug() tried to report
what was at the location of the ip address, and read it directly. This
caused the machine to panic, as the ip was not pointing to a valid memory
address.
Instead, read the ip address with copy_from_kernel_nofault() to safely
access the memory, and if it faults, report that the address faulted,
otherwise report what was in that location.
Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.c…
Cc: stable(a)vger.kernel.org
Fixes: 05736a427f7e1 ("ftrace: warn on failure to disable mcount callers")
Reported-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2e8a3fde7104..72ef4dccbcc4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1967,12 +1967,18 @@ static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
static void print_ip_ins(const char *fmt, const unsigned char *p)
{
+ char ins[MCOUNT_INSN_SIZE];
int i;
+ if (copy_from_kernel_nofault(ins, p, MCOUNT_INSN_SIZE)) {
+ printk(KERN_CONT "%s[FAULT] %px\n", fmt, p);
+ return;
+ }
+
printk(KERN_CONT "%s", fmt);
for (i = 0; i < MCOUNT_INSN_SIZE; i++)
- printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]);
+ printk(KERN_CONT "%s%02x", i ? ":" : "", ins[i]);
}
enum ftrace_bug_type ftrace_bug_type;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8d396bb0a5b62b326f6be7594d8bd46b088296bd Mon Sep 17 00:00:00 2001
From: Jack Pham <jackp(a)codeaurora.org>
Date: Sat, 29 May 2021 12:29:32 -0700
Subject: [PATCH] usb: dwc3: debugfs: Add and remove endpoint dirs dynamically
The DWC3 DebugFS directory and files are currently created once
during probe. This includes creation of subdirectories for each
of the gadget's endpoints. This works fine for peripheral-only
controllers, as dwc3_core_init_mode() calls dwc3_gadget_init()
just prior to calling dwc3_debugfs_init().
However, for dual-role controllers, dwc3_core_init_mode() will
instead call dwc3_drd_init() which is problematic in a few ways.
First, the initial state must be determined, then dwc3_set_mode()
will have to schedule drd_work and by then dwc3_debugfs_init()
could have already been invoked. Even if the initial mode is
peripheral, dwc3_gadget_init() happens after the DebugFS files
are created, and worse so if the initial state is host and the
controller switches to peripheral much later. And secondly,
even if the gadget endpoints' debug entries were successfully
created, if the controller exits peripheral mode, its dwc3_eps
are freed so the debug files would now hold stale references.
So it is best if the DebugFS endpoint entries are created and
removed dynamically at the same time the underlying dwc3_eps are.
Do this by calling dwc3_debugfs_create_endpoint_dir() as each
endpoint is created, and conversely remove the DebugFS entry when
the endpoint is freed.
Fixes: 41ce1456e1db ("usb: dwc3: core: make dwc3_set_mode() work properly")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Peter Chen <peter.chen(a)kernel.org>
Signed-off-by: Jack Pham <jackp(a)codeaurora.org>
Link: https://lore.kernel.org/r/20210529192932.22912-1-jackp@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/debug.h b/drivers/usb/dwc3/debug.h
index d0ac89c5b317..d223c54115f4 100644
--- a/drivers/usb/dwc3/debug.h
+++ b/drivers/usb/dwc3/debug.h
@@ -413,9 +413,12 @@ static inline const char *dwc3_gadget_generic_cmd_status_string(int status)
#ifdef CONFIG_DEBUG_FS
+extern void dwc3_debugfs_create_endpoint_dir(struct dwc3_ep *dep);
extern void dwc3_debugfs_init(struct dwc3 *d);
extern void dwc3_debugfs_exit(struct dwc3 *d);
#else
+static inline void dwc3_debugfs_create_endpoint_dir(struct dwc3_ep *dep)
+{ }
static inline void dwc3_debugfs_init(struct dwc3 *d)
{ }
static inline void dwc3_debugfs_exit(struct dwc3 *d)
diff --git a/drivers/usb/dwc3/debugfs.c b/drivers/usb/dwc3/debugfs.c
index 7146ee2ac057..5dbbe53269d3 100644
--- a/drivers/usb/dwc3/debugfs.c
+++ b/drivers/usb/dwc3/debugfs.c
@@ -886,30 +886,14 @@ static void dwc3_debugfs_create_endpoint_files(struct dwc3_ep *dep,
}
}
-static void dwc3_debugfs_create_endpoint_dir(struct dwc3_ep *dep,
- struct dentry *parent)
+void dwc3_debugfs_create_endpoint_dir(struct dwc3_ep *dep)
{
struct dentry *dir;
- dir = debugfs_create_dir(dep->name, parent);
+ dir = debugfs_create_dir(dep->name, dep->dwc->root);
dwc3_debugfs_create_endpoint_files(dep, dir);
}
-static void dwc3_debugfs_create_endpoint_dirs(struct dwc3 *dwc,
- struct dentry *parent)
-{
- int i;
-
- for (i = 0; i < dwc->num_eps; i++) {
- struct dwc3_ep *dep = dwc->eps[i];
-
- if (!dep)
- continue;
-
- dwc3_debugfs_create_endpoint_dir(dep, parent);
- }
-}
-
void dwc3_debugfs_init(struct dwc3 *dwc)
{
struct dentry *root;
@@ -940,7 +924,6 @@ void dwc3_debugfs_init(struct dwc3 *dwc)
&dwc3_testmode_fops);
debugfs_create_file("link_state", 0644, root, dwc,
&dwc3_link_state_fops);
- dwc3_debugfs_create_endpoint_dirs(dwc, root);
}
}
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 88270eee8a48..f14c2aa83759 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2753,6 +2753,8 @@ static int dwc3_gadget_init_endpoint(struct dwc3 *dwc, u8 epnum)
INIT_LIST_HEAD(&dep->started_list);
INIT_LIST_HEAD(&dep->cancelled_list);
+ dwc3_debugfs_create_endpoint_dir(dep);
+
return 0;
}
@@ -2796,6 +2798,7 @@ static void dwc3_gadget_free_endpoints(struct dwc3 *dwc)
list_del(&dep->endpoint.ep_list);
}
+ debugfs_remove_recursive(debugfs_lookup(dep->name, dwc->root));
kfree(dep);
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6490fa565534fa83593278267785a694fd378a2b Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Fri, 28 May 2021 16:16:13 +0800
Subject: [PATCH] usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms
Current timer PD_T_SINK_WAIT_CAP is set to 240ms which will violate the
SinkWaitCapTimer (tTypeCSinkWaitCap 310 - 620 ms) defined in the PD
Spec if the port is faster enough when running the state machine. Set it
to the lower bound 310ms to ensure the timeout is in Spec.
Fixes: f0690a25a140 ("staging: typec: USB Type-C Port Manager (tcpm)")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Link: https://lore.kernel.org/r/20210528081613.730661-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/include/linux/usb/pd.h b/include/linux/usb/pd.h
index bf00259493e0..96b7ff66f074 100644
--- a/include/linux/usb/pd.h
+++ b/include/linux/usb/pd.h
@@ -460,7 +460,7 @@ static inline unsigned int rdo_max_power(u32 rdo)
#define PD_T_RECEIVER_RESPONSE 15 /* 15ms max */
#define PD_T_SOURCE_ACTIVITY 45
#define PD_T_SINK_ACTIVITY 135
-#define PD_T_SINK_WAIT_CAP 240
+#define PD_T_SINK_WAIT_CAP 310 /* 310 - 620 ms */
#define PD_T_PS_TRANSITION 500
#define PD_T_SRC_TRANSITION 35
#define PD_T_DRP_SNK 40
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2c9017d0b5d3fbf17e69577a42d9e610ca122810 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Date: Wed, 2 Jun 2021 09:34:35 +0200
Subject: [PATCH] mmc: renesas_sdhi: abort tuning when timeout detected
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We have to bring the eMMC from sending-data state back to transfer state
once we detected a CRC error (timeout) during tuning. So, send a stop
command via mmc_abort_tuning().
Fixes: 4f11997773b6 ("mmc: tmio: Add tuning support")
Reported-by Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas(a)ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/20210602073435.5955-1-wsa+renesas@sang-engineerin…
Cc: stable(a)vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index 635bf31a6735..9029308c4a0f 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -692,14 +692,19 @@ static int renesas_sdhi_execute_tuning(struct mmc_host *mmc, u32 opcode)
/* Issue CMD19 twice for each tap */
for (i = 0; i < 2 * priv->tap_num; i++) {
+ int cmd_error;
+
/* Set sampling clock position */
sd_scc_write32(host, priv, SH_MOBILE_SDHI_SCC_TAPSET, i % priv->tap_num);
- if (mmc_send_tuning(mmc, opcode, NULL) == 0)
+ if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0)
set_bit(i, priv->taps);
if (sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP) == 0)
set_bit(i, priv->smpcmp);
+
+ if (cmd_error)
+ mmc_abort_tuning(mmc, opcode);
}
ret = renesas_sdhi_select_tuning(host);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2c9017d0b5d3fbf17e69577a42d9e610ca122810 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Date: Wed, 2 Jun 2021 09:34:35 +0200
Subject: [PATCH] mmc: renesas_sdhi: abort tuning when timeout detected
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We have to bring the eMMC from sending-data state back to transfer state
once we detected a CRC error (timeout) during tuning. So, send a stop
command via mmc_abort_tuning().
Fixes: 4f11997773b6 ("mmc: tmio: Add tuning support")
Reported-by Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas(a)ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/20210602073435.5955-1-wsa+renesas@sang-engineerin…
Cc: stable(a)vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index 635bf31a6735..9029308c4a0f 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -692,14 +692,19 @@ static int renesas_sdhi_execute_tuning(struct mmc_host *mmc, u32 opcode)
/* Issue CMD19 twice for each tap */
for (i = 0; i < 2 * priv->tap_num; i++) {
+ int cmd_error;
+
/* Set sampling clock position */
sd_scc_write32(host, priv, SH_MOBILE_SDHI_SCC_TAPSET, i % priv->tap_num);
- if (mmc_send_tuning(mmc, opcode, NULL) == 0)
+ if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0)
set_bit(i, priv->taps);
if (sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP) == 0)
set_bit(i, priv->smpcmp);
+
+ if (cmd_error)
+ mmc_abort_tuning(mmc, opcode);
}
ret = renesas_sdhi_select_tuning(host);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2c9017d0b5d3fbf17e69577a42d9e610ca122810 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Date: Wed, 2 Jun 2021 09:34:35 +0200
Subject: [PATCH] mmc: renesas_sdhi: abort tuning when timeout detected
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We have to bring the eMMC from sending-data state back to transfer state
once we detected a CRC error (timeout) during tuning. So, send a stop
command via mmc_abort_tuning().
Fixes: 4f11997773b6 ("mmc: tmio: Add tuning support")
Reported-by Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas(a)ragnatech.se>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/20210602073435.5955-1-wsa+renesas@sang-engineerin…
Cc: stable(a)vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index 635bf31a6735..9029308c4a0f 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -692,14 +692,19 @@ static int renesas_sdhi_execute_tuning(struct mmc_host *mmc, u32 opcode)
/* Issue CMD19 twice for each tap */
for (i = 0; i < 2 * priv->tap_num; i++) {
+ int cmd_error;
+
/* Set sampling clock position */
sd_scc_write32(host, priv, SH_MOBILE_SDHI_SCC_TAPSET, i % priv->tap_num);
- if (mmc_send_tuning(mmc, opcode, NULL) == 0)
+ if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0)
set_bit(i, priv->taps);
if (sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP) == 0)
set_bit(i, priv->smpcmp);
+
+ if (cmd_error)
+ mmc_abort_tuning(mmc, opcode);
}
ret = renesas_sdhi_select_tuning(host);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41fe8d088e96472f63164e213de44ec77be69478 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Mon, 7 Jun 2021 20:50:52 +0800
Subject: [PATCH] bcache: avoid oversized read request in cache missing code
path
In the cache missing code path of cached device, if a proper location
from the internal B+ tree is matched for a cache miss range, function
cached_dev_cache_miss() will be called in cache_lookup_fn() in the
following code block,
[code block 1]
526 unsigned int sectors = KEY_INODE(k) == s->iop.inode
527 ? min_t(uint64_t, INT_MAX,
528 KEY_START(k) - bio->bi_iter.bi_sector)
529 : INT_MAX;
530 int ret = s->d->cache_miss(b, s, bio, sectors);
Here s->d->cache_miss() is the call backfunction pointer initialized as
cached_dev_cache_miss(), the last parameter 'sectors' is an important
hint to calculate the size of read request to backing device of the
missing cache data.
Current calculation in above code block may generate oversized value of
'sectors', which consequently may trigger 2 different potential kernel
panics by BUG() or BUG_ON() as listed below,
1) BUG_ON() inside bch_btree_insert_key(),
[code block 2]
886 BUG_ON(b->ops->is_extents && !KEY_SIZE(k));
2) BUG() inside biovec_slab(),
[code block 3]
51 default:
52 BUG();
53 return NULL;
All the above panics are original from cached_dev_cache_miss() by the
oversized parameter 'sectors'.
Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate
the size of data read from backing device for the cache missing. This
size is stored in s->insert_bio_sectors by the following lines of code,
[code block 4]
909 s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada);
Then the actual key inserting to the internal B+ tree is generated and
stored in s->iop.replace_key by the following lines of code,
[code block 5]
911 s->iop.replace_key = KEY(s->iop.inode,
912 bio->bi_iter.bi_sector + s->insert_bio_sectors,
913 s->insert_bio_sectors);
The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from
the above code block.
And the bio sending to backing device for the missing data is allocated
with hint from s->insert_bio_sectors by the following lines of code,
[code block 6]
926 cache_bio = bio_alloc_bioset(GFP_NOWAIT,
927 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
928 &dc->disk.bio_split);
The oversized parameter 'sectors' may trigger panic 2) by BUG() from the
agove code block.
Now let me explain how the panics happen with the oversized 'sectors'.
In code block 5, replace_key is generated by macro KEY(). From the
definition of macro KEY(),
[code block 7]
71 #define KEY(inode, offset, size) \
72 ((struct bkey) { \
73 .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode), \
74 .low = (offset) \
75 })
Here 'size' is 16bits width embedded in 64bits member 'high' of struct
bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is
very probably to be larger than (1<<16) - 1, which makes the bkey size
calculation in code block 5 is overflowed. In one bug report the value
of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors'
results the overflowed s->insert_bio_sectors in code block 4, then makes
size field of s->iop.replace_key to be 0 in code block 5. Then the 0-
sized s->iop.replace_key is inserted into the internal B+ tree as cache
missing check key (a special key to detect and avoid a racing between
normal write request and cache missing read request) as,
[code block 8]
915 ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key);
Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey
size check BUG_ON() in code block 2, and causes the kernel panic 1).
Another kernel panic is from code block 6, is by the bvecs number
oversized value s->insert_bio_sectors from code block 4,
min(sectors, bio_sectors(bio) + reada)
There are two possibility for oversized reresult,
- bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized.
- sectors < bio_sectors(bio) + reada, but sectors is oversized.
>From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors,
PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many
other values which larther than BIO_MAX_VECS (a.k.a 256). When calling
bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter,
this value will eventually be sent to biovec_slab() as parameter
'nr_vecs' in following code path,
bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab()
Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG()
in code block 3 is triggered inside biovec_slab().
>From the above analysis, we know that the 4th parameter 'sector' sent
into cached_dev_cache_miss() may cause overflow in code block 5 and 6,
and finally cause kernel panic in code block 2 and 3. And if result of
bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger
kernel panic in code block 3 from code block 6.
Now the almost-useless readahead size for cache missing request back to
backing device is removed, this patch can fix the oversized issue with
more simpler method.
- add a local variable size_limit, set it by the minimum value from
the max bkey size and max bio bvecs number.
- set s->insert_bio_sectors by the minimum value from size_limit,
sectors, and the sectors size of bio.
- replace sectors by s->insert_bio_sectors to do bio_next_split.
By the above method with size_limit, s->insert_bio_sectors will never
result oversized replace_key size or bio bvecs number. And split bio
'miss' from bio_next_split() will always match the size of 'cache_bio',
that is the current maximum bio size we can sent to backing device for
fetching the cache missing data.
Current problmatic code can be partially found since Linux v3.13-rc1,
therefore all maintained stable kernels should try to apply this fix.
Reported-by: Alexander Ullrich <ealex1979(a)gmail.com>
Reported-by: Diego Ercolani <diego.ercolani(a)gmail.com>
Reported-by: Jan Szubiak <jan.szubiak(a)linuxpolska.pl>
Reported-by: Marco Rebhan <me(a)dblsaiko.net>
Reported-by: Matthias Ferdinand <bcache(a)mfedv.net>
Reported-by: Victor Westerhuis <victor(a)westerhu.is>
Reported-by: Vojtech Pavlik <vojtech(a)suse.cz>
Reported-and-tested-by: Rolf Fokkens <rolf(a)rolffokkens.nl>
Reported-and-tested-by: Thorsten Knabe <linux(a)thorsten-knabe.de>
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Kent Overstreet <kent.overstreet(a)gmail.com>
Cc: Nix <nix(a)esperi.org.uk>
Cc: Takashi Iwai <tiwai(a)suse.com>
Link: https://lore.kernel.org/r/20210607125052.21277-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index ab8ff18df32a..6d1de889baeb 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -882,6 +882,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
int ret = MAP_CONTINUE;
struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
struct bio *miss, *cache_bio;
+ unsigned int size_limit;
s->cache_missed = 1;
@@ -891,7 +892,10 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
goto out_submit;
}
- s->insert_bio_sectors = min(sectors, bio_sectors(bio));
+ /* Limitation for valid replace key size and cache_bio bvecs number */
+ size_limit = min_t(unsigned int, BIO_MAX_VECS * PAGE_SECTORS,
+ (1 << KEY_SIZE_BITS) - 1);
+ s->insert_bio_sectors = min3(size_limit, sectors, bio_sectors(bio));
s->iop.replace_key = KEY(s->iop.inode,
bio->bi_iter.bi_sector + s->insert_bio_sectors,
@@ -903,7 +907,8 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
s->iop.replace = true;
- miss = bio_next_split(bio, sectors, GFP_NOIO, &s->d->bio_split);
+ miss = bio_next_split(bio, s->insert_bio_sectors, GFP_NOIO,
+ &s->d->bio_split);
/* btree_search_recurse()'s btree iterator is no good anymore */
ret = miss == bio ? MAP_DONE : -EINTR;
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5f0c2ee1fe8de700dd0d1cdc63e1a7338e2d3a3d Mon Sep 17 00:00:00 2001
From: Loic Poulain <loic.poulain(a)linaro.org>
Date: Sun, 6 Jun 2021 21:07:41 +0530
Subject: [PATCH] bus: mhi: pci-generic: Fix hibernation
This patch fixes crash after resuming from hibernation. The issue
occurs when mhi stack is builtin and so part of the 'restore-kernel',
causing the device to be resumed from 'restored kernel' with a no
more valid context (memory mappings etc...) and leading to spurious
crashes.
This patch fixes the issue by implementing proper freeze/restore
callbacks.
Link: https://lore.kernel.org/r/1622571445-4505-1-git-send-email-loic.poulain@lin…
Reported-by: Shujun Wang <wsj20369(a)163.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Signed-off-by: Loic Poulain <loic.poulain(a)linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20210606153741.20725-4-manivannan.sadhasivam@lina…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 0a6619ad292c..b3357a8a2fdb 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -935,9 +935,43 @@ static int __maybe_unused mhi_pci_resume(struct device *dev)
return ret;
}
+static int __maybe_unused mhi_pci_freeze(struct device *dev)
+{
+ struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
+ struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+
+ /* We want to stop all operations, hibernation does not guarantee that
+ * device will be in the same state as before freezing, especially if
+ * the intermediate restore kernel reinitializes MHI device with new
+ * context.
+ */
+ if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
+ mhi_power_down(mhi_cntrl, false);
+ mhi_unprepare_after_power_down(mhi_cntrl);
+ }
+
+ return 0;
+}
+
+static int __maybe_unused mhi_pci_restore(struct device *dev)
+{
+ struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
+
+ /* Reinitialize the device */
+ queue_work(system_long_wq, &mhi_pdev->recovery_work);
+
+ return 0;
+}
+
static const struct dev_pm_ops mhi_pci_pm_ops = {
SET_RUNTIME_PM_OPS(mhi_pci_runtime_suspend, mhi_pci_runtime_resume, NULL)
- SET_SYSTEM_SLEEP_PM_OPS(mhi_pci_suspend, mhi_pci_resume)
+#ifdef CONFIG_PM_SLEEP
+ .suspend = mhi_pci_suspend,
+ .resume = mhi_pci_resume,
+ .freeze = mhi_pci_freeze,
+ .thaw = mhi_pci_restore,
+ .restore = mhi_pci_restore,
+#endif
};
static struct pci_driver mhi_pci_driver = {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5f0c2ee1fe8de700dd0d1cdc63e1a7338e2d3a3d Mon Sep 17 00:00:00 2001
From: Loic Poulain <loic.poulain(a)linaro.org>
Date: Sun, 6 Jun 2021 21:07:41 +0530
Subject: [PATCH] bus: mhi: pci-generic: Fix hibernation
This patch fixes crash after resuming from hibernation. The issue
occurs when mhi stack is builtin and so part of the 'restore-kernel',
causing the device to be resumed from 'restored kernel' with a no
more valid context (memory mappings etc...) and leading to spurious
crashes.
This patch fixes the issue by implementing proper freeze/restore
callbacks.
Link: https://lore.kernel.org/r/1622571445-4505-1-git-send-email-loic.poulain@lin…
Reported-by: Shujun Wang <wsj20369(a)163.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Signed-off-by: Loic Poulain <loic.poulain(a)linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20210606153741.20725-4-manivannan.sadhasivam@lina…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 0a6619ad292c..b3357a8a2fdb 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -935,9 +935,43 @@ static int __maybe_unused mhi_pci_resume(struct device *dev)
return ret;
}
+static int __maybe_unused mhi_pci_freeze(struct device *dev)
+{
+ struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
+ struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+
+ /* We want to stop all operations, hibernation does not guarantee that
+ * device will be in the same state as before freezing, especially if
+ * the intermediate restore kernel reinitializes MHI device with new
+ * context.
+ */
+ if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
+ mhi_power_down(mhi_cntrl, false);
+ mhi_unprepare_after_power_down(mhi_cntrl);
+ }
+
+ return 0;
+}
+
+static int __maybe_unused mhi_pci_restore(struct device *dev)
+{
+ struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
+
+ /* Reinitialize the device */
+ queue_work(system_long_wq, &mhi_pdev->recovery_work);
+
+ return 0;
+}
+
static const struct dev_pm_ops mhi_pci_pm_ops = {
SET_RUNTIME_PM_OPS(mhi_pci_runtime_suspend, mhi_pci_runtime_resume, NULL)
- SET_SYSTEM_SLEEP_PM_OPS(mhi_pci_suspend, mhi_pci_resume)
+#ifdef CONFIG_PM_SLEEP
+ .suspend = mhi_pci_suspend,
+ .resume = mhi_pci_resume,
+ .freeze = mhi_pci_freeze,
+ .thaw = mhi_pci_restore,
+ .restore = mhi_pci_restore,
+#endif
};
static struct pci_driver mhi_pci_driver = {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b1bd5cba3306691c771d558e94baa73e8b0b96b7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
Date: Thu, 3 Jun 2021 13:24:55 +0800
Subject: [PATCH] KVM: X86: MMU: Use the correct inherited permissions to get
shadow page
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index 5bfe28b0728e..20d85daed395 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -171,8 +171,8 @@ Shadow pages contain the following information:
shadow pages) so role.quadrant takes values in the range 0..3. Each
quadrant maps 1GB virtual address space.
role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
+ Inherited guest access permissions from the parent ptes in the form uwx.
+ Note execute permission is positive, not negative.
role.invalid:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 70b7e44e3035..823a5919f9fa 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -90,8 +90,8 @@ struct guest_walker {
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
bool pte_writable[PT_MAX_FULL_LEVELS];
- unsigned pt_access;
- unsigned pte_access;
+ unsigned int pt_access[PT_MAX_FULL_LEVELS];
+ unsigned int pte_access;
gfn_t gfn;
struct x86_exception fault;
};
@@ -418,13 +418,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
walker->ptes[walker->level - 1] = pte;
+
+ /* Convert to ACC_*_MASK flags for struct guest_walker. */
+ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
} while (!is_last_gpte(mmu, walker->level, pte));
pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
/* Convert to ACC_*_MASK flags for struct guest_walker. */
- walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask);
errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
if (unlikely(errcode))
@@ -463,7 +465,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
- __func__, (u64)pte, walker->pte_access, walker->pt_access);
+ __func__, (u64)pte, walker->pte_access,
+ walker->pt_access[walker->level - 1]);
return 1;
error:
@@ -643,7 +646,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
- unsigned direct_access, access = gw->pt_access;
+ unsigned int direct_access, access;
int top_level, level, req_level, ret;
gfn_t base_gfn = gw->gfn;
@@ -675,6 +678,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
sp = NULL;
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
+ access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b1bd5cba3306691c771d558e94baa73e8b0b96b7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
Date: Thu, 3 Jun 2021 13:24:55 +0800
Subject: [PATCH] KVM: X86: MMU: Use the correct inherited permissions to get
shadow page
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index 5bfe28b0728e..20d85daed395 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -171,8 +171,8 @@ Shadow pages contain the following information:
shadow pages) so role.quadrant takes values in the range 0..3. Each
quadrant maps 1GB virtual address space.
role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
+ Inherited guest access permissions from the parent ptes in the form uwx.
+ Note execute permission is positive, not negative.
role.invalid:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 70b7e44e3035..823a5919f9fa 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -90,8 +90,8 @@ struct guest_walker {
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
bool pte_writable[PT_MAX_FULL_LEVELS];
- unsigned pt_access;
- unsigned pte_access;
+ unsigned int pt_access[PT_MAX_FULL_LEVELS];
+ unsigned int pte_access;
gfn_t gfn;
struct x86_exception fault;
};
@@ -418,13 +418,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
walker->ptes[walker->level - 1] = pte;
+
+ /* Convert to ACC_*_MASK flags for struct guest_walker. */
+ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
} while (!is_last_gpte(mmu, walker->level, pte));
pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
/* Convert to ACC_*_MASK flags for struct guest_walker. */
- walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask);
errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
if (unlikely(errcode))
@@ -463,7 +465,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
- __func__, (u64)pte, walker->pte_access, walker->pt_access);
+ __func__, (u64)pte, walker->pte_access,
+ walker->pt_access[walker->level - 1]);
return 1;
error:
@@ -643,7 +646,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
- unsigned direct_access, access = gw->pt_access;
+ unsigned int direct_access, access;
int top_level, level, req_level, ret;
gfn_t base_gfn = gw->gfn;
@@ -675,6 +678,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
sp = NULL;
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
+ access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b1bd5cba3306691c771d558e94baa73e8b0b96b7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
Date: Thu, 3 Jun 2021 13:24:55 +0800
Subject: [PATCH] KVM: X86: MMU: Use the correct inherited permissions to get
shadow page
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index 5bfe28b0728e..20d85daed395 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -171,8 +171,8 @@ Shadow pages contain the following information:
shadow pages) so role.quadrant takes values in the range 0..3. Each
quadrant maps 1GB virtual address space.
role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
+ Inherited guest access permissions from the parent ptes in the form uwx.
+ Note execute permission is positive, not negative.
role.invalid:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 70b7e44e3035..823a5919f9fa 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -90,8 +90,8 @@ struct guest_walker {
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
bool pte_writable[PT_MAX_FULL_LEVELS];
- unsigned pt_access;
- unsigned pte_access;
+ unsigned int pt_access[PT_MAX_FULL_LEVELS];
+ unsigned int pte_access;
gfn_t gfn;
struct x86_exception fault;
};
@@ -418,13 +418,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
walker->ptes[walker->level - 1] = pte;
+
+ /* Convert to ACC_*_MASK flags for struct guest_walker. */
+ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
} while (!is_last_gpte(mmu, walker->level, pte));
pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
/* Convert to ACC_*_MASK flags for struct guest_walker. */
- walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask);
errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
if (unlikely(errcode))
@@ -463,7 +465,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
- __func__, (u64)pte, walker->pte_access, walker->pt_access);
+ __func__, (u64)pte, walker->pte_access,
+ walker->pt_access[walker->level - 1]);
return 1;
error:
@@ -643,7 +646,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
- unsigned direct_access, access = gw->pt_access;
+ unsigned int direct_access, access;
int top_level, level, req_level, ret;
gfn_t base_gfn = gw->gfn;
@@ -675,6 +678,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
sp = NULL;
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
+ access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b1bd5cba3306691c771d558e94baa73e8b0b96b7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
Date: Thu, 3 Jun 2021 13:24:55 +0800
Subject: [PATCH] KVM: X86: MMU: Use the correct inherited permissions to get
shadow page
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index 5bfe28b0728e..20d85daed395 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -171,8 +171,8 @@ Shadow pages contain the following information:
shadow pages) so role.quadrant takes values in the range 0..3. Each
quadrant maps 1GB virtual address space.
role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
+ Inherited guest access permissions from the parent ptes in the form uwx.
+ Note execute permission is positive, not negative.
role.invalid:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 70b7e44e3035..823a5919f9fa 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -90,8 +90,8 @@ struct guest_walker {
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
bool pte_writable[PT_MAX_FULL_LEVELS];
- unsigned pt_access;
- unsigned pte_access;
+ unsigned int pt_access[PT_MAX_FULL_LEVELS];
+ unsigned int pte_access;
gfn_t gfn;
struct x86_exception fault;
};
@@ -418,13 +418,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
walker->ptes[walker->level - 1] = pte;
+
+ /* Convert to ACC_*_MASK flags for struct guest_walker. */
+ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
} while (!is_last_gpte(mmu, walker->level, pte));
pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
/* Convert to ACC_*_MASK flags for struct guest_walker. */
- walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask);
errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
if (unlikely(errcode))
@@ -463,7 +465,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
- __func__, (u64)pte, walker->pte_access, walker->pt_access);
+ __func__, (u64)pte, walker->pte_access,
+ walker->pt_access[walker->level - 1]);
return 1;
error:
@@ -643,7 +646,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
- unsigned direct_access, access = gw->pt_access;
+ unsigned int direct_access, access;
int top_level, level, req_level, ret;
gfn_t base_gfn = gw->gfn;
@@ -675,6 +678,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
sp = NULL;
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
+ access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b1bd5cba3306691c771d558e94baa73e8b0b96b7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
Date: Thu, 3 Jun 2021 13:24:55 +0800
Subject: [PATCH] KVM: X86: MMU: Use the correct inherited permissions to get
shadow page
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index 5bfe28b0728e..20d85daed395 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -171,8 +171,8 @@ Shadow pages contain the following information:
shadow pages) so role.quadrant takes values in the range 0..3. Each
quadrant maps 1GB virtual address space.
role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
+ Inherited guest access permissions from the parent ptes in the form uwx.
+ Note execute permission is positive, not negative.
role.invalid:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 70b7e44e3035..823a5919f9fa 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -90,8 +90,8 @@ struct guest_walker {
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
bool pte_writable[PT_MAX_FULL_LEVELS];
- unsigned pt_access;
- unsigned pte_access;
+ unsigned int pt_access[PT_MAX_FULL_LEVELS];
+ unsigned int pte_access;
gfn_t gfn;
struct x86_exception fault;
};
@@ -418,13 +418,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
walker->ptes[walker->level - 1] = pte;
+
+ /* Convert to ACC_*_MASK flags for struct guest_walker. */
+ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
} while (!is_last_gpte(mmu, walker->level, pte));
pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
/* Convert to ACC_*_MASK flags for struct guest_walker. */
- walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask);
errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
if (unlikely(errcode))
@@ -463,7 +465,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
- __func__, (u64)pte, walker->pte_access, walker->pt_access);
+ __func__, (u64)pte, walker->pte_access,
+ walker->pt_access[walker->level - 1]);
return 1;
error:
@@ -643,7 +646,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
- unsigned direct_access, access = gw->pt_access;
+ unsigned int direct_access, access;
int top_level, level, req_level, ret;
gfn_t base_gfn = gw->gfn;
@@ -675,6 +678,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
sp = NULL;
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
+ access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b53e84eed08b88fd3ff59e5c2a7f1a69d4004e32 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
Date: Tue, 1 Jun 2021 01:22:56 +0800
Subject: [PATCH] KVM: x86: Unload MMU on guest TLB flush if TDP disabled to
force MMU sync
When using shadow paging, unload the guest MMU when emulating a guest TLB
flush to ensure all roots are synchronized. From the guest's perspective,
flushing the TLB ensures any and all modifications to its PTEs will be
recognized by the CPU.
Note, unloading the MMU is overkill, but is done to mirror KVM's existing
handling of INVPCID(all) and ensure the bug is squashed. Future cleanup
can be done to more precisely synchronize roots when servicing a guest
TLB flush.
If TDP is enabled, synchronizing the MMU is unnecessary even if nested
TDP is in play, as a "legacy" TLB flush from L1 does not invalidate L1's
TDP mappings. For EPT, an explicit INVEPT is required to invalidate
guest-physical mappings; for NPT, guest mappings are always tagged with
an ASID and thus can only be invalidated via the VMCB's ASID control.
This bug has existed since the introduction of KVM_VCPU_FLUSH_TLB.
It was only recently exposed after Linux guests stopped flushing the
local CPU's TLB prior to flushing remote TLBs (see commit 4ce94eabac16,
"x86/mm/tlb: Flush remote and local TLBs concurrently"), but is also
visible in Windows 10 guests.
Tested-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Fixes: f38a7b75267f ("KVM: X86: support paravirtualized help for TLB shootdowns")
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
[sean: massaged comment and changelog]
Message-Id: <20210531172256.2908-1-jiangshanlai(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e2144eedaf79..9dd23bdfc6cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3072,6 +3072,19 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
{
++vcpu->stat.tlb_flush;
+
+ if (!tdp_enabled) {
+ /*
+ * A TLB flush on behalf of the guest is equivalent to
+ * INVPCID(all), toggling CR4.PGE, etc., which requires
+ * a forced sync of the shadow page tables. Unload the
+ * entire MMU here and the subsequent load will sync the
+ * shadow page tables, and also flush the TLB.
+ */
+ kvm_mmu_unload(vcpu);
+ return;
+ }
+
static_call(kvm_x86_tlb_flush_guest)(vcpu);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b53e84eed08b88fd3ff59e5c2a7f1a69d4004e32 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
Date: Tue, 1 Jun 2021 01:22:56 +0800
Subject: [PATCH] KVM: x86: Unload MMU on guest TLB flush if TDP disabled to
force MMU sync
When using shadow paging, unload the guest MMU when emulating a guest TLB
flush to ensure all roots are synchronized. From the guest's perspective,
flushing the TLB ensures any and all modifications to its PTEs will be
recognized by the CPU.
Note, unloading the MMU is overkill, but is done to mirror KVM's existing
handling of INVPCID(all) and ensure the bug is squashed. Future cleanup
can be done to more precisely synchronize roots when servicing a guest
TLB flush.
If TDP is enabled, synchronizing the MMU is unnecessary even if nested
TDP is in play, as a "legacy" TLB flush from L1 does not invalidate L1's
TDP mappings. For EPT, an explicit INVEPT is required to invalidate
guest-physical mappings; for NPT, guest mappings are always tagged with
an ASID and thus can only be invalidated via the VMCB's ASID control.
This bug has existed since the introduction of KVM_VCPU_FLUSH_TLB.
It was only recently exposed after Linux guests stopped flushing the
local CPU's TLB prior to flushing remote TLBs (see commit 4ce94eabac16,
"x86/mm/tlb: Flush remote and local TLBs concurrently"), but is also
visible in Windows 10 guests.
Tested-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Fixes: f38a7b75267f ("KVM: X86: support paravirtualized help for TLB shootdowns")
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
[sean: massaged comment and changelog]
Message-Id: <20210531172256.2908-1-jiangshanlai(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e2144eedaf79..9dd23bdfc6cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3072,6 +3072,19 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
{
++vcpu->stat.tlb_flush;
+
+ if (!tdp_enabled) {
+ /*
+ * A TLB flush on behalf of the guest is equivalent to
+ * INVPCID(all), toggling CR4.PGE, etc., which requires
+ * a forced sync of the shadow page tables. Unload the
+ * entire MMU here and the subsequent load will sync the
+ * shadow page tables, and also flush the TLB.
+ */
+ kvm_mmu_unload(vcpu);
+ return;
+ }
+
static_call(kvm_x86_tlb_flush_guest)(vcpu);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b53e84eed08b88fd3ff59e5c2a7f1a69d4004e32 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
Date: Tue, 1 Jun 2021 01:22:56 +0800
Subject: [PATCH] KVM: x86: Unload MMU on guest TLB flush if TDP disabled to
force MMU sync
When using shadow paging, unload the guest MMU when emulating a guest TLB
flush to ensure all roots are synchronized. From the guest's perspective,
flushing the TLB ensures any and all modifications to its PTEs will be
recognized by the CPU.
Note, unloading the MMU is overkill, but is done to mirror KVM's existing
handling of INVPCID(all) and ensure the bug is squashed. Future cleanup
can be done to more precisely synchronize roots when servicing a guest
TLB flush.
If TDP is enabled, synchronizing the MMU is unnecessary even if nested
TDP is in play, as a "legacy" TLB flush from L1 does not invalidate L1's
TDP mappings. For EPT, an explicit INVEPT is required to invalidate
guest-physical mappings; for NPT, guest mappings are always tagged with
an ASID and thus can only be invalidated via the VMCB's ASID control.
This bug has existed since the introduction of KVM_VCPU_FLUSH_TLB.
It was only recently exposed after Linux guests stopped flushing the
local CPU's TLB prior to flushing remote TLBs (see commit 4ce94eabac16,
"x86/mm/tlb: Flush remote and local TLBs concurrently"), but is also
visible in Windows 10 guests.
Tested-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Fixes: f38a7b75267f ("KVM: X86: support paravirtualized help for TLB shootdowns")
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
[sean: massaged comment and changelog]
Message-Id: <20210531172256.2908-1-jiangshanlai(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e2144eedaf79..9dd23bdfc6cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3072,6 +3072,19 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
{
++vcpu->stat.tlb_flush;
+
+ if (!tdp_enabled) {
+ /*
+ * A TLB flush on behalf of the guest is equivalent to
+ * INVPCID(all), toggling CR4.PGE, etc., which requires
+ * a forced sync of the shadow page tables. Unload the
+ * entire MMU here and the subsequent load will sync the
+ * shadow page tables, and also flush the TLB.
+ */
+ kvm_mmu_unload(vcpu);
+ return;
+ }
+
static_call(kvm_x86_tlb_flush_guest)(vcpu);
}
This is a note to let you know that I've just added the patch titled
nvmem: core: add a missing of_node_put
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 63879e2964bceee2aa5bbe8b99ea58bba28bb64f Mon Sep 17 00:00:00 2001
From: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Date: Fri, 11 Jun 2021 11:23:21 +0100
Subject: nvmem: core: add a missing of_node_put
'for_each_child_of_node' performs an of_node_get on each iteration, so a
return from the middle of the loop requires an of_node_put.
Fixes: e888d445ac33 ("nvmem: resolve cells from DT at registration time")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20210611102321.11509-1-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvmem/core.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 4d1c4f83b22f..20799e622b5b 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -690,15 +690,17 @@ static int nvmem_add_cells_from_of(struct nvmem_device *nvmem)
continue;
if (len < 2 * sizeof(u32)) {
dev_err(dev, "nvmem: invalid reg on %pOF\n", child);
+ of_node_put(child);
return -EINVAL;
}
cell = kzalloc(sizeof(*cell), GFP_KERNEL);
- if (!cell)
+ if (!cell) {
+ of_node_put(child);
return -ENOMEM;
+ }
cell->nvmem = nvmem;
- cell->np = of_node_get(child);
cell->offset = be32_to_cpup(addr++);
cell->bytes = be32_to_cpup(addr);
cell->name = kasprintf(GFP_KERNEL, "%pOFn", child);
@@ -719,11 +721,12 @@ static int nvmem_add_cells_from_of(struct nvmem_device *nvmem)
cell->name, nvmem->stride);
/* Cells already added will be freed later. */
kfree_const(cell->name);
- of_node_put(cell->np);
kfree(cell);
+ of_node_put(child);
return -EINVAL;
}
+ cell->np = of_node_get(child);
nvmem_cell_add(cell);
}
--
2.32.0
The fsl_mxc_udc driver was originally developed as a part of a DENX
project, adding Marek to CC to have them check internally what their
preferences and requirements might be.
Thanks
Guennadi
On Fri, 11 Jun 2021, Leo Li wrote:
> > -----Original Message-----
> > From: Arnd Bergmann <arnd(a)arndb.de>
> > Sent: Friday, June 11, 2021 10:43 AM
> > To: Fabio Estevam <festevam(a)gmail.com>
> > Cc: gregkh <gregkh(a)linuxfoundation.org>; Felipe Balbi <balbi(a)kernel.org>;
> > Joel Stanley <joel(a)jms.id.au>; Leo Li <leoyang.li(a)nxp.com>; kbuild test
> > robot <lkp(a)intel.com>; Peter Chen <peter.chen(a)nxp.com>; Ran Wang
> > <ran.wang_1(a)nxp.com>; Stephen Rothwell <sfr(a)canb.auug.org.au>;
> > Shawn Guo <shawnguo(a)kernel.org>; # 3.4.x <stable(a)vger.kernel.org>;
> > Guennadi Liakhovetski <g.liakhovetski(a)gmx.de>
> > Subject: Re: patch "Revert "usb: gadget: fsl: Re-enable driver for ARM SoCs""
> > added to usb-linus
> >
> > On Fri, Jun 11, 2021 at 4:51 PM Fabio Estevam <festevam(a)gmail.com> wrote:
> > > On Fri, Jun 11, 2021 at 5:30 AM Arnd Bergmann <arnd(a)arndb.de> wrote:
> > >
> > > > Adding Fabio and Guennadi to Cc.
> > > >
> > > > I can see that the missing symbols were in a driver that got removed
> > > > in commit a390bef7db1f ("usb: gadget: fsl_mxc_udc: Remove the driver").
> > > >
> > > > If CONFIG_ARCH_MXC is disabled, these are stubbed out in the header
> > file.
> > > > These were added a long time ago by Guennadi Liakhovetski
> > > > 54e4026b64a9
> > > > ("USB: gadget: Add i.MX3x support to the fsl_usb2_udc driver"). I
> > > > also see that this patch added a few #ifdef CONFIG_ARCH_MXC checks
> > > > to the driver that still remain today. This is clearly broken as it
> > > > must be possible to use the same driver module on both SOC_LS1021A
> > > > and i.MX using a runtime check.
> > > >
> > > > I also don't see any i.MX variant actually using this driver, but
> > > > instead see the dts files declaring fsl,imx27-usb devices, which
> > > > bind to the drivers/usb/chipidea/ci_hdrc_imx.c driver. Is this one
> > > > of those cases where we have two separate drivers for the same
> > > > hardware, or is this for a different device?
> > >
> > > Exactly. The USB IP on several i.MX devices comes from ChipIdea.
> > >
> > > Prior to using devicetree, we had the fsl_mxc_udc driver to handle the
> > > gadget side.
> > >
> > > Since i.MX has been converted to a DT-only platform, we no longer need
> > > fsl_mxc_udc, as drivers/usb/chipidea is used nowadays.
> >
> > Ok, good, so the simples solution I suppose is to remove the remaining bits
> > that Guennadi added when he wrote the removed driver in 54e4026b64a9,
> > and then re-apply Joel's patch.
>
> I can provide a patch for this.
>
> >
> > Alternatively, it might be possible change the chipidea driver to work on
> > ls1021a and ls1012a, assuming that they use the same hardware block as i.MX.
>
> It is also used on many legacy FSL PowerPC SoCs. I agree with the direction, but it does require some effort to make sure it works on all these legacy platforms. I think Ran Wang had tried to do that, but not completed due to bandwidth issue.
>
> >
> > Either way, it would be good to test the changes on at least one of these two
> > platforms.
> >
> > Arnd
>
I'm announcing the release of the 4.14.236 kernel.
All users of the 4.14 kernel series must upgrade.
The updated 4.14.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.14.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/kvm/svm.c | 8
drivers/firmware/efi/cper.c | 4
drivers/firmware/efi/memattr.c | 5
drivers/hid/i2c-hid/i2c-hid-core.c | 4
drivers/hid/usbhid/hid-pidff.c | 1
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1
drivers/net/usb/cdc_ncm.c | 12
drivers/vfio/pci/Kconfig | 1
drivers/vfio/pci/vfio_pci_config.c | 2
drivers/vfio/platform/vfio_platform_common.c | 2
drivers/xen/xen-pciback/vpci.c | 14 -
fs/btrfs/file-item.c | 10
fs/btrfs/tree-log.c | 13
fs/ext4/extents.c | 43 +--
fs/ocfs2/file.c | 55 +++-
include/linux/bpf_verifier.h | 5
include/linux/usb/usbnet.h | 2
include/net/caif/caif_dev.h | 2
include/net/caif/cfcnfg.h | 2
include/net/caif/cfserl.h | 1
init/main.c | 2
kernel/bpf/verifier.c | 369 ++++++++++++++++-----------
kernel/sched/fair.c | 7
mm/hugetlb.c | 14 -
net/bluetooth/hci_core.c | 7
net/bluetooth/hci_sock.c | 4
net/caif/caif_dev.c | 13
net/caif/caif_usb.c | 14 -
net/caif/cfcnfg.c | 16 -
net/caif/cfserl.c | 5
net/ieee802154/nl-mac.c | 4
net/ieee802154/nl-phy.c | 4
net/netfilter/ipvs/ip_vs_ctl.c | 2
net/netfilter/nfnetlink_cthelper.c | 8
net/nfc/llcp_sock.c | 2
sound/core/timer.c | 3
tools/testing/selftests/bpf/test_align.c | 26 -
tools/testing/selftests/bpf/test_verifier.c | 114 ++++----
39 files changed, 500 insertions(+), 303 deletions(-)
Alexei Starovoitov (4):
bpf: do not allow root to mangle valid pointers
bpf/verifier: disallow pointer subtraction
selftests/bpf: fix test_align
selftests/bpf: make 'dubious pointer arithmetic' test useful
Arnd Bergmann (1):
HID: i2c-hid: fix format string mismatch
Cheng Jian (1):
sched/fair: Optimize select_idle_cpu
Daniel Borkmann (12):
bpf: Move off_reg into sanitize_ptr_alu
bpf: Ensure off_reg has no mixed signed bounds for all types
bpf: Rework ptr_limit into alu_limit and add common error path
bpf: Improve verifier error messages for users
bpf: Refactor and streamline bounds check into helper
bpf: Move sanitize_val_alu out of op switch
bpf: Tighten speculative pointer arithmetic mask
bpf: Update selftests to reflect new error states
bpf: Fix leakage of uninitialized bpf stack under speculation
bpf: Wrap aux data inside bpf_sanitize_info container
bpf: Fix mask direction swap upon off reg sign change
bpf: No need to simulate speculative domain for immediates
Grant Grundler (1):
net: usb: cdc_ncm: don't spew notifications
Greg Kroah-Hartman (1):
Linux 4.14.236
Heiner Kallweit (1):
efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Jan Beulich (1):
xen-pciback: redo VF placement in the virtual topology
Josef Bacik (2):
btrfs: fix error handling in btrfs_del_csums
btrfs: fixup error handling in fixup_inode_link_counts
Julian Anastasov (1):
ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Junxiao Bi (1):
ocfs2: fix data corruption by fallocate
Krzysztof Kozlowski (1):
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Lin Ma (2):
Bluetooth: fix the erroneous flush_work() order
Bluetooth: use correct lock to prevent UAF of hdev object
Mark Rutland (1):
pid: take a reference when initializing `cad_pid`
Max Gurtovoy (1):
vfio/platform: fix module_put call in error flow
Michael Chan (1):
bnxt_en: Remove the setting of dev_port.
Mina Almasry (1):
mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Pablo Neira Ayuso (1):
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Pavel Skripkin (4):
net: caif: added cfserl_release function
net: caif: add proper error handling
net: caif: fix memory leak in caif_device_notify
net: caif: fix memory leak in cfusbl_device_notify
Piotr Krysiuk (1):
bpf, selftests: Fix up some test_verifier cases for unprivileged
Randy Dunlap (1):
vfio/pci: zap_vma_ptes() needs MMU
Rasmus Villemoes (1):
efi: cper: fix snprintf() use in cper_dimm_err_location()
Sean Christopherson (1):
KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Takashi Iwai (1):
ALSA: timer: Fix master timer notification
Wei Yongjun (1):
ieee802154: fix error return code in ieee802154_llsec_getparams()
Ye Bin (1):
ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Zhen Lei (3):
vfio/pci: Fix error return code in vfio_ecap_init()
HID: pidff: fix error return code in hid_pidff_init()
ieee802154: fix error return code in ieee802154_add_iface()
I'm announcing the release of the 4.19.194 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm64/kvm/sys_regs.c | 42 ++--
arch/x86/include/asm/apic.h | 1
arch/x86/kernel/apic/apic.c | 1
arch/x86/kernel/apic/vector.c | 20 +
arch/x86/kvm/svm.c | 8
drivers/acpi/bus.c | 42 +---
drivers/firmware/efi/cper.c | 4
drivers/firmware/efi/memattr.c | 5
drivers/hid/hid-multitouch.c | 10
drivers/hid/i2c-hid/i2c-hid-core.c | 4
drivers/hid/usbhid/hid-pidff.c | 1
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3
drivers/net/usb/cdc_ncm.c | 12 +
drivers/usb/dwc2/core_intr.c | 4
drivers/vfio/pci/Kconfig | 1
drivers/vfio/pci/vfio_pci_config.c | 2
drivers/vfio/platform/vfio_platform_common.c | 2
drivers/xen/xen-pciback/vpci.c | 14 -
fs/btrfs/extent-tree.c | 12 -
fs/btrfs/file-item.c | 10
fs/btrfs/inode.c | 12 +
fs/btrfs/tree-log.c | 13 -
fs/ext4/extents.c | 43 ++--
fs/ocfs2/file.c | 55 ++++-
include/linux/perf_event.h | 5
include/linux/usb/usbnet.h | 2
include/net/caif/caif_dev.h | 2
include/net/caif/cfcnfg.h | 2
include/net/caif/cfserl.h | 1
include/uapi/linux/bpf.h | 14 +
init/main.c | 2
kernel/bpf/syscall.c | 7
kernel/bpf/verifier.c | 3
kernel/events/core.c | 62 +++---
kernel/sched/fair.c | 7
mm/hugetlb.c | 14 +
net/bluetooth/hci_core.c | 7
net/bluetooth/hci_sock.c | 4
net/caif/caif_dev.c | 13 -
net/caif/caif_usb.c | 14 +
net/caif/cfcnfg.c | 16 +
net/caif/cfserl.c | 5
net/ieee802154/nl-mac.c | 4
net/ieee802154/nl-phy.c | 4
net/netfilter/ipvs/ip_vs_ctl.c | 2
net/netfilter/nfnetlink_cthelper.c | 8
net/nfc/llcp_sock.c | 2
net/tipc/bearer.c | 94 ++++++---
net/wireless/core.h | 2
net/wireless/nl80211.c | 7
net/wireless/util.c | 39 +++
samples/vfio-mdev/mdpy-fb.c | 13 -
sound/core/timer.c | 3
sound/pci/hda/patch_realtek.c | 1
sound/usb/mixer_quirks.c | 2
tools/include/uapi/linux/bpf.h | 14 +
tools/lib/bpf/bpf.c | 8
tools/lib/bpf/bpf.h | 2
tools/testing/selftests/bpf/test_align.c | 4
tools/testing/selftests/bpf/test_verifier.c | 224 +++++++++++++++-------
62 files changed, 663 insertions(+), 274 deletions(-)
Ahelenia Ziemiańska (1):
HID: multitouch: require Finger field to mark Win8 reports as MT
Anand Jain (1):
btrfs: fix unmountable seed device after fstrim
Anant Thazhemadam (1):
nl80211: validate key indexes for cfg80211_registered_device
Arnd Bergmann (1):
HID: i2c-hid: fix format string mismatch
Björn Töpel (2):
selftests/bpf: add "any alignment" annotation for some tests
selftests/bpf: Avoid running unprivileged tests with alignment requirements
Carlos M (1):
ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
Cheng Jian (1):
sched/fair: Optimize select_idle_cpu
Daniel Borkmann (2):
bpf: fix test suite to enable all unpriv program types
bpf: test make sure to run unpriv test cases in test_verifier
David S. Miller (4):
bpf: Add BPF_F_ANY_ALIGNMENT.
bpf: Adjust F_NEEDS_EFFICIENT_UNALIGNED_ACCESS handling in test_verifier.c
bpf: Make more use of 'any' alignment in test_verifier.c
bpf: Apply F_NEEDS_EFFICIENT_UNALIGNED_ACCESS to more ACCEPT test cases.
Erik Schmauss (1):
ACPI: probe ECDT before loading AML tables regardless of module-level code flag
Grant Grundler (1):
net: usb: cdc_ncm: don't spew notifications
Greg Kroah-Hartman (1):
Linux 4.19.194
Heiner Kallweit (1):
efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Hoang Le (2):
tipc: add extack messages for bearer/media failure
tipc: fix unique bearer names sanity check
Ian Rogers (1):
perf/cgroups: Don't rotate events for cgroups unnecessarily
Jan Beulich (1):
xen-pciback: redo VF placement in the virtual topology
Joe Stringer (1):
selftests/bpf: Generalize dummy program types
Josef Bacik (4):
btrfs: mark ordered extent and inode with error if we fail to finish
btrfs: fix error handling in btrfs_del_csums
btrfs: return errors from btrfs_del_csums in cleanup_ref_head
btrfs: fixup error handling in fixup_inode_link_counts
Julian Anastasov (1):
ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Junxiao Bi (1):
ocfs2: fix data corruption by fallocate
Krzysztof Kozlowski (1):
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Lin Ma (2):
Bluetooth: fix the erroneous flush_work() order
Bluetooth: use correct lock to prevent UAF of hdev object
Magnus Karlsson (1):
ixgbevf: add correct exception tracing for XDP
Marc Zyngier (1):
KVM: arm64: Fix debug register indexing
Mark Rutland (1):
pid: take a reference when initializing `cad_pid`
Max Gurtovoy (1):
vfio/platform: fix module_put call in error flow
Michael Chan (1):
bnxt_en: Remove the setting of dev_port.
Mina Almasry (1):
mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Pablo Neira Ayuso (1):
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Pavel Skripkin (4):
net: caif: added cfserl_release function
net: caif: add proper error handling
net: caif: fix memory leak in caif_device_notify
net: caif: fix memory leak in cfusbl_device_notify
Phil Elwell (1):
usb: dwc2: Fix build in periphal-only mode
Pierre-Louis Bossart (1):
ALSA: usb: update old-style static const declaration
Rafael J. Wysocki (1):
ACPI: EC: Look for ECDT EC after calling acpi_load_tables()
Randy Dunlap (1):
vfio/pci: zap_vma_ptes() needs MMU
Rasmus Villemoes (1):
efi: cper: fix snprintf() use in cper_dimm_err_location()
Sean Christopherson (1):
KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Song Liu (1):
perf/core: Fix corner case in perf_rotate_context()
Takashi Iwai (1):
ALSA: timer: Fix master timer notification
Thomas Gleixner (1):
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
Wei Yongjun (2):
samples: vfio-mdev: fix error handing in mdpy_fb_probe()
ieee802154: fix error return code in ieee802154_llsec_getparams()
Ye Bin (1):
ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Zhen Lei (3):
vfio/pci: Fix error return code in vfio_ecap_init()
HID: pidff: fix error return code in hid_pidff_init()
ieee802154: fix error return code in ieee802154_add_iface()
With global filtering, we only allow an event to be scheduled if its
filter settings exactly match those of any existing events, therefore
it is pointless to reapply the filter in that case. Much worse, though,
is that in doing that we trample the event type of counter 0 if it's
already active, and never touch the appropriate PMEVTYPERn so the new
event is likely not counting the right thing either. Don't do that.
CC: stable(a)vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
---
drivers/perf/arm_smmuv3_pmu.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index ff6fab4bae30..863d9f702aa1 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -277,7 +277,7 @@ static int smmu_pmu_apply_event_filter(struct smmu_pmu *smmu_pmu,
struct perf_event *event, int idx)
{
u32 span, sid;
- unsigned int num_ctrs = smmu_pmu->num_counters;
+ unsigned int cur_idx, num_ctrs = smmu_pmu->num_counters;
bool filter_en = !!get_filter_enable(event);
span = filter_en ? get_filter_span(event) :
@@ -285,17 +285,19 @@ static int smmu_pmu_apply_event_filter(struct smmu_pmu *smmu_pmu,
sid = filter_en ? get_filter_stream_id(event) :
SMMU_PMCG_DEFAULT_FILTER_SID;
- /* Support individual filter settings */
- if (!smmu_pmu->global_filter) {
+ cur_idx = find_first_bit(smmu_pmu->used_counters, num_ctrs);
+ /*
+ * Per-counter filtering, or scheduling the first globally-filtered
+ * event into an empty PMU so idx == 0 and it works out equivalent.
+ */
+ if (!smmu_pmu->global_filter || cur_idx == num_ctrs) {
smmu_pmu_set_event_filter(event, idx, span, sid);
return 0;
}
- /* Requested settings same as current global settings*/
- idx = find_first_bit(smmu_pmu->used_counters, num_ctrs);
- if (idx == num_ctrs ||
- smmu_pmu_check_global_filter(smmu_pmu->events[idx], event)) {
- smmu_pmu_set_event_filter(event, 0, span, sid);
+ /* Otherwise, must match whatever's currently scheduled */
+ if (smmu_pmu_check_global_filter(smmu_pmu->events[cur_idx], event)) {
+ smmu_pmu_set_evtyper(smmu_pmu, idx, get_event(event));
return 0;
}
--
2.25.1
Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding
fails. If a failure happens after a successful LAUNCH_START command,
a decommission command should be executed. Otherwise, guest context
will be unfreed inside the AMD SP. After the firmware will not have
memory to allocate more SEV guest context, LAUNCH_START command will
begin to fail with SEV_RET_RESOURCE_LIMIT error.
The existing code calls decommission inside sev_unbind_asid, but it is
not called if a failure happens before guest activation succeeds. If
sev_bind_asid fails, decommission is never called. PSP firmware has a
limit for the number of guests. If sev_asid_binding fails many times,
PSP firmware will not have resources to create another guest context.
Cc: stable(a)vger.kernel.org
Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command")
Reported-by: Peter Gonda <pgonda(a)google.com>
Signed-off-by: Alper Gun <alpergun(a)google.com>
---
arch/x86/kvm/svm/sev.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e0ce5da97fc2..8d36f0c73071 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -199,9 +199,19 @@ static void sev_asid_free(struct kvm_sev_info *sev)
sev->misc_cg = NULL;
}
-static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
+static void sev_decommission(unsigned int handle)
{
struct sev_data_decommission decommission;
+
+ if (!handle)
+ return;
+
+ decommission.handle = handle;
+ sev_guest_decommission(&decommission, NULL);
+}
+
+static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
+{
struct sev_data_deactivate deactivate;
if (!handle)
@@ -214,9 +224,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_guest_deactivate(&deactivate, NULL);
up_read(&sev_deactivate_lock);
- /* decommission handle */
- decommission.handle = handle;
- sev_guest_decommission(&decommission, NULL);
+ sev_decommission(handle);
}
static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -341,8 +349,10 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
/* Bind ASID to this guest */
ret = sev_bind_asid(kvm, start.handle, error);
- if (ret)
+ if (ret) {
+ sev_decommission(start.handle);
goto e_free_session;
+ }
/* return handle to userspace */
params.handle = start.handle;
--
2.32.0.272.g935e593368-goog
This is a note to let you know that I've just added the patch titled
Revert "usb: gadget: fsl: Re-enable driver for ARM SoCs"
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From abd062886cd103196b4f26cf735c3a3619dec76b Mon Sep 17 00:00:00 2001
From: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Date: Fri, 11 Jun 2021 09:18:47 +0200
Subject: Revert "usb: gadget: fsl: Re-enable driver for ARM SoCs"
This reverts commit e0e8b6abe8c862229ba00cdd806e8598cdef00bb.
Turns out this breaks the build. We had numerous reports of problems
from linux-next and 0-day about this not working properly, so revert it
for now until it can be figured out properly.
The build errors are:
arm-linux-gnueabi-ld: fsl_udc_core.c:(.text+0x29d4): undefined reference to `fsl_udc_clk_finalize'
arm-linux-gnueabi-ld: fsl_udc_core.c:(.text+0x2ba8): undefined reference to `fsl_udc_clk_release'
fsl_udc_core.c:(.text+0x2848): undefined reference to `fsl_udc_clk_init'
fsl_udc_core.c:(.text+0xe88): undefined reference to `fsl_udc_clk_release'
Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
Reported-by: kernel test robot <lkp(a)intel.com>
Fixes: e0e8b6abe8c8 ("usb: gadget: fsl: Re-enable driver for ARM SoCs")
Cc: stable <stable(a)vger.kernel.org>
Cc: Joel Stanley <joel(a)jms.id.au>
Cc: Leo Li <leoyang.li(a)nxp.com>
Cc: Peter Chen <peter.chen(a)nxp.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Shawn Guo <shawnguo(a)kernel.org>
Cc: Ran Wang <ran.wang_1(a)nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/Kconfig b/drivers/usb/gadget/udc/Kconfig
index 7348acbdc560..8c614bb86c66 100644
--- a/drivers/usb/gadget/udc/Kconfig
+++ b/drivers/usb/gadget/udc/Kconfig
@@ -90,7 +90,7 @@ config USB_BCM63XX_UDC
config USB_FSL_USB2
tristate "Freescale Highspeed USB DR Peripheral Controller"
- depends on FSL_SOC || ARCH_LAYERSCAPE || SOC_LS1021A || COMPILE_TEST
+ depends on FSL_SOC
help
Some of Freescale PowerPC and i.MX processors have a High Speed
Dual-Role(DR) USB controller, which supports device mode.
--
2.32.0