Hi,
I didn’t know if you had received my email from last week?
Can you direct me to the person that handles your company marketing and
promo items?
Do you have any upcoming events, tradeshows or promotional needs?
We manufacture ALL custom LOGO and branded products.
The most asked about product that we make, is the custom printed USB flash
drives.
We can print your logo on them and load your digital images, videos and
files.
Here is what we include:
-Any size memory you need: 64MB up to 128GB
-We will print your logo on both sides, just ask!
-Very Low Order Minimums
-Need them quickly? Not a problem, we offer Rush Service
We can make a custom shaped USB drive to look like your Logo or product!
Email over a copy of your logo and we will create a design mock up for you
at no cost!
Our higher memory sizes are a really good option right now.
Ask about the “Double Your Memory” upgrade promotion going on right
now!
Let us know what you need and we will get you a quick quote.
We always offer great rates for schools and nonprofits as well.
Regards,
Lilly Koller
Logo USB Account Manager
Hi,
I didn’t know if you had received my email from last week?
Can you direct me to the person that handles your company marketing and
promo items?
Do you have any upcoming events, tradeshows or promotional needs?
We manufacture ALL custom LOGO and branded products.
The most asked about product that we make, is the custom printed USB flash
drives.
We can print your logo on them and load your digital images, videos and
files.
Here is what we include:
-Any size memory you need: 64MB up to 128GB
-We will print your logo on both sides, just ask!
-Very Low Order Minimums
-Need them quickly? Not a problem, we offer Rush Service
We can make a custom shaped USB drive to look like your Logo or product!
Email over a copy of your logo and we will create a design mock up for you
at no cost!
Our higher memory sizes are a really good option right now.
Ask about the “Double Your Memory” upgrade promotion going on right
now!
Let us know what you need and we will get you a quick quote.
We always offer great rates for schools and nonprofits as well.
Regards,
Lilly Koller
Logo USB Account Manager
This is a note to let you know that I've just added the patch titled
stm class: Fix a module refcount leak in policy creation error path
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From c18614a1a11276837bdd44403d84d207c9951538 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Date: Wed, 19 Dec 2018 17:19:20 +0200
Subject: stm class: Fix a module refcount leak in policy creation error path
Commit c7fd62bc69d0 ("stm class: Introduce framing protocol drivers")
adds a bug into the error path of policy creation, that would do a
module_put() on a wrong module, if one tried to create a policy for
an stm device which already has a policy, using a different protocol.
IOW,
| mkdir /config/stp-policy/dummy_stm.0:p_basic.test
| mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # puts "p_basic"
| mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # "p_basic" -> -1
throws:
| general protection fault: 0000 [#1] SMP PTI
| CPU: 3 PID: 2887 Comm: mkdir
| RIP: 0010:module_put.part.31+0xe/0x90
| Call Trace:
| module_put+0x13/0x20
| stm_put_protocol+0x11/0x20 [stm_core]
| stp_policy_make+0xf1/0x210 [stm_core]
| ? __kmalloc+0x183/0x220
| ? configfs_mkdir+0x10d/0x4c0
| configfs_mkdir+0x169/0x4c0
| vfs_mkdir+0x108/0x1c0
| do_mkdirat+0xe8/0x110
| __x64_sys_mkdir+0x1b/0x20
| do_syscall_64+0x5a/0x140
| entry_SYSCALL_64_after_hwframe+0x44/0xa9
Correct this sad mistake by calling calling 'put' on the correct
reference, which happens to match another error path in the same
function, so we consolidate the two at the same time.
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Fixes: c7fd62bc69d0 ("stm class: Introduce framing protocol drivers")
Reported-by: Ammy Yi <ammy.yi(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/hwtracing/stm/policy.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/hwtracing/stm/policy.c b/drivers/hwtracing/stm/policy.c
index 0910ec807187..4b9e44b227d8 100644
--- a/drivers/hwtracing/stm/policy.c
+++ b/drivers/hwtracing/stm/policy.c
@@ -440,10 +440,8 @@ stp_policy_make(struct config_group *group, const char *name)
stm->policy = kzalloc(sizeof(*stm->policy), GFP_KERNEL);
if (!stm->policy) {
- mutex_unlock(&stm->policy_mutex);
- stm_put_protocol(pdrv);
- stm_put_device(stm);
- return ERR_PTR(-ENOMEM);
+ ret = ERR_PTR(-ENOMEM);
+ goto unlock_policy;
}
config_group_init_type_name(&stm->policy->group, name,
@@ -458,7 +456,11 @@ stp_policy_make(struct config_group *group, const char *name)
mutex_unlock(&stm->policy_mutex);
if (IS_ERR(ret)) {
- stm_put_protocol(stm->pdrv);
+ /*
+ * pdrv and stm->pdrv at this point can be quite different,
+ * and only one of them needs to be 'put'
+ */
+ stm_put_protocol(pdrv);
stm_put_device(stm);
}
--
2.20.1
This is a note to let you know that I've just added the patch titled
serial: uartps: Fix interrupt mask issue to handle the RX interrupts
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 260683137ab5276113fc322fdbbc578024185fee Mon Sep 17 00:00:00 2001
From: Nava kishore Manne <nava.manne(a)xilinx.com>
Date: Tue, 18 Dec 2018 13:18:42 +0100
Subject: serial: uartps: Fix interrupt mask issue to handle the RX interrupts
properly
This patch Correct the RX interrupt mask value to handle the
RX interrupts properly.
Fixes: c8dbdc842d30 ("serial: xuartps: Rewrite the interrupt handling logic")
Signed-off-by: Nava kishore Manne <nava.manne(a)xilinx.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Michal Simek <michal.simek(a)xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/xilinx_uartps.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c
index c6d38617d622..094f2958cb2b 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -123,7 +123,7 @@ MODULE_PARM_DESC(rx_timeout, "Rx timeout, 1-255");
#define CDNS_UART_IXR_RXTRIG 0x00000001 /* RX FIFO trigger interrupt */
#define CDNS_UART_IXR_RXFULL 0x00000004 /* RX FIFO full interrupt. */
#define CDNS_UART_IXR_RXEMPTY 0x00000002 /* RX FIFO empty interrupt. */
-#define CDNS_UART_IXR_MASK 0x00001FFF /* Valid bit mask */
+#define CDNS_UART_IXR_RXMASK 0x000021e7 /* Valid RX bit mask */
/*
* Do not enable parity error interrupt for the following
@@ -364,7 +364,7 @@ static irqreturn_t cdns_uart_isr(int irq, void *dev_id)
cdns_uart_handle_tx(dev_id);
isrstatus &= ~CDNS_UART_IXR_TXEMPTY;
}
- if (isrstatus & CDNS_UART_IXR_MASK)
+ if (isrstatus & CDNS_UART_IXR_RXMASK)
cdns_uart_handle_rx(dev_id, isrstatus);
spin_unlock(&port->lock);
--
2.20.1
In setup_arch_memory we reserve the memory area wherein the kernel
is located. Current implementation may reserve more memory than
it actually required in case of CONFIG_LINUX_LINK_BASE is not
equal to CONFIG_LINUX_RAM_BASE. This happens because we calculate
start of the reserved region relatively to the CONFIG_LINUX_RAM_BASE
and end of the region relatively to the CONFIG_LINUX_RAM_BASE.
For example in case of HSDK board we wasted 256MiB of physical memory:
------------------->8------------------------------
Memory: 770416K/1048576K available (5496K kernel code,
240K rwdata, 1064K rodata, 2200K init, 275K bss,
278160K reserved, 0K cma-reserved)
------------------->8------------------------------
Fix that.
Cc: stable(a)vger.kernel.org
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev(a)synopsys.com>
---
arch/arc/mm/init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
index f8fe5668b30f..a56e6a8ed259 100644
--- a/arch/arc/mm/init.c
+++ b/arch/arc/mm/init.c
@@ -137,7 +137,8 @@ void __init setup_arch_memory(void)
*/
memblock_add_node(low_mem_start, low_mem_sz, 0);
- memblock_reserve(low_mem_start, __pa(_end) - low_mem_start);
+ memblock_reserve(CONFIG_LINUX_LINK_BASE,
+ __pa(_end) - CONFIG_LINUX_LINK_BASE);
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start)
--
2.14.5
Hi, Sasha,
We should backport commit a9aec5881b9d4aca184b29d33484a6a5 ("lib/iomap_copy.c: add __ioread32_copy()") for linux-4.4, linux-3.18 and linux-3.16.
Huacai
------------------ Original ------------------
From: "Sasha Levin"<sashal(a)kernel.org>;
Date: Wed, Dec 19, 2018 09:47 PM
To: "Sasha Levin"<sashal(a)kernel.org>; "Huacai Chen"<chenhc(a)lemote.com>; "James E . J . Bottomley"<jejb(a)linux.vnet.ibm.com>;
Cc: "Martin K . Petersen"<martin.petersen(a)oracle.com>; "stable"<stable(a)vger.kernel.org>; "stable"<stable(a)vger.kernel.org>;
Subject: Re: [PATCH V2] scsi: lpfc: Switch memcpy_fromio() to __read32_copy()
Hi,
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146, v4.4.168, v3.18.130,
v4.19.10: Build OK!
v4.14.89: Build OK!
v4.9.146: Build OK!
v4.4.168: Build failed! Errors:
drivers/scsi/lpfc/lpfc_compat.h:93:2: error: implicit declaration of function ‘__ioread32_copy’; did you mean ‘__iowrite32_copy’? [-Werror=implicit-function-declaration]
v3.18.130: Build failed! Errors:
drivers/scsi/lpfc/lpfc_compat.h:93:2: error: implicit declaration of function ‘__ioread32_copy’; did you mean ‘__iowrite32_copy’? [-Werror=implicit-function-declaration]
How should we proceed with this patch?
--
Thanks,
Sasha
Hi,
I want to ask for the changes in cd7f3a249dbe (rtc: snvs: Add timeouts
to avoid kernel lockups) to be backported to the stable releases.
The reason is, that this patch fixes a real bug, that can cause the
kernel to lock up. I can reproduce this lockup reliably with an i.MX6UL,
PREEMPTIVE_RT_FULL enabled and v4.14.89.
Thanks,
Frieder
This is a note to let you know that I've just added the patch titled
stm class: Fix a module refcount leak in policy creation error path
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From c18614a1a11276837bdd44403d84d207c9951538 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Date: Wed, 19 Dec 2018 17:19:20 +0200
Subject: stm class: Fix a module refcount leak in policy creation error path
Commit c7fd62bc69d0 ("stm class: Introduce framing protocol drivers")
adds a bug into the error path of policy creation, that would do a
module_put() on a wrong module, if one tried to create a policy for
an stm device which already has a policy, using a different protocol.
IOW,
| mkdir /config/stp-policy/dummy_stm.0:p_basic.test
| mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # puts "p_basic"
| mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # "p_basic" -> -1
throws:
| general protection fault: 0000 [#1] SMP PTI
| CPU: 3 PID: 2887 Comm: mkdir
| RIP: 0010:module_put.part.31+0xe/0x90
| Call Trace:
| module_put+0x13/0x20
| stm_put_protocol+0x11/0x20 [stm_core]
| stp_policy_make+0xf1/0x210 [stm_core]
| ? __kmalloc+0x183/0x220
| ? configfs_mkdir+0x10d/0x4c0
| configfs_mkdir+0x169/0x4c0
| vfs_mkdir+0x108/0x1c0
| do_mkdirat+0xe8/0x110
| __x64_sys_mkdir+0x1b/0x20
| do_syscall_64+0x5a/0x140
| entry_SYSCALL_64_after_hwframe+0x44/0xa9
Correct this sad mistake by calling calling 'put' on the correct
reference, which happens to match another error path in the same
function, so we consolidate the two at the same time.
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Fixes: c7fd62bc69d0 ("stm class: Introduce framing protocol drivers")
Reported-by: Ammy Yi <ammy.yi(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/hwtracing/stm/policy.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/hwtracing/stm/policy.c b/drivers/hwtracing/stm/policy.c
index 0910ec807187..4b9e44b227d8 100644
--- a/drivers/hwtracing/stm/policy.c
+++ b/drivers/hwtracing/stm/policy.c
@@ -440,10 +440,8 @@ stp_policy_make(struct config_group *group, const char *name)
stm->policy = kzalloc(sizeof(*stm->policy), GFP_KERNEL);
if (!stm->policy) {
- mutex_unlock(&stm->policy_mutex);
- stm_put_protocol(pdrv);
- stm_put_device(stm);
- return ERR_PTR(-ENOMEM);
+ ret = ERR_PTR(-ENOMEM);
+ goto unlock_policy;
}
config_group_init_type_name(&stm->policy->group, name,
@@ -458,7 +456,11 @@ stp_policy_make(struct config_group *group, const char *name)
mutex_unlock(&stm->policy_mutex);
if (IS_ERR(ret)) {
- stm_put_protocol(stm->pdrv);
+ /*
+ * pdrv and stm->pdrv at this point can be quite different,
+ * and only one of them needs to be 'put'
+ */
+ stm_put_protocol(pdrv);
stm_put_device(stm);
}
--
2.20.1
The 'nr_pages' attribute of the 'msc' subdevices parses a comma-separated
list of window sizes, passed from userspace. However, there is a bug in
the string parsing logic wherein it doesn't exclude the comma character
from the range of characters as it consumes them. This leads to an
out-of-bounds access given a sufficiently long list. For example:
> # echo 8,8,8,8 > /sys/bus/intel_th/devices/0-msc0/nr_pages
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in memchr+0x1e/0x40
> Read of size 1 at addr ffff8803ffcebcd1 by task sh/825
>
> CPU: 3 PID: 825 Comm: npktest.sh Tainted: G W 4.20.0-rc1+
> Call Trace:
> dump_stack+0x7c/0xc0
> print_address_description+0x6c/0x23c
> ? memchr+0x1e/0x40
> kasan_report.cold.5+0x241/0x308
> memchr+0x1e/0x40
> nr_pages_store+0x203/0xd00 [intel_th_msu]
Fix this by accounting for the comma character.
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Fixes: ba82664c134ef ("intel_th: Add Memory Storage Unit driver")
Cc: stable(a)vger.kernel.org # v4.4+
---
drivers/hwtracing/intel_th/msu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/intel_th/msu.c b/drivers/hwtracing/intel_th/msu.c
index d293e55553bd..ba7aaf421f36 100644
--- a/drivers/hwtracing/intel_th/msu.c
+++ b/drivers/hwtracing/intel_th/msu.c
@@ -1423,7 +1423,8 @@ nr_pages_store(struct device *dev, struct device_attribute *attr,
if (!end)
break;
- len -= end - p;
+ /* consume the number and the following comma, hence +1 */
+ len -= end - p + 1;
p = end + 1;
} while (len);
--
2.19.2
This is the start of the stable review cycle for the 4.19.11 release.
There are 44 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu Dec 20 16:39:02 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.11-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.11-rc1
Masahiro Yamada <yamada.masahiro(a)socionext.com>
x86/build: Fix compiler support check for CONFIG_RETPOLINE
Damien Le Moal <damien.lemoal(a)wdc.com>
dm zoned: Fix target BIO completion handling
Junwei Zhang <Jerry.Zhang(a)amd.com>
drm/amdgpu: update SMC firmware image for polaris10 variants
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu: update smu firmware images for VI variants (v2)
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu: add some additional vega10 pci ids
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdkfd: add new vega10 pci ids
Kenneth Feng <kenneth.feng(a)amd.com>
drm/amdgpu/powerplay: Apply avfs cks-off voltages on VI
Chris Wilson <chris(a)chris-wilson.co.uk>
drm/i915/execlists: Apply a full mb before execution for Braswell
Tina Zhang <tina.zhang(a)intel.com>
drm/i915/gvt: Fix tiled memory decoding bug on BDW
Brian Norris <briannorris(a)chromium.org>
Revert "drm/rockchip: Allow driver to be shutdown on reboot/kexec"
Ben Skeggs <bskeggs(a)redhat.com>
drm/nouveau/kms/nv50-: also flush fb writes when rewinding push buffer
Lyude Paul <lyude(a)redhat.com>
drm/nouveau/kms: Fix memory leak in nv50_mstm_del()
Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
powerpc: Look for "stdout-path" when setting up legacy consoles
Radu Rendec <radu.rendec(a)gmail.com>
powerpc/msi: Fix NULL pointer access in teardown code
Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
media: vb2: don't call __vb2_queue_cancel if vb2_start_streaming failed
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak of instance function hash filters
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak in set_trigger_filter()
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix memory leak in create_filter()
Mike Snitzer <snitzer(a)redhat.com>
dm: call blk_queue_split() to impose device limits on bios
Mike Snitzer <snitzer(a)redhat.com>
dm cache metadata: verify cache has blocks in blocks_are_clean_separate_dirty()
Mike Snitzer <snitzer(a)redhat.com>
dm thin: send event about thin-pool state change _after_ making it
Stefan Wahren <stefan.wahren(a)i2se.com>
ARM: dts: bcm2837: Fix polarity of wifi reset GPIOs
Lubomir Rintel <lkundrak(a)v3.sk>
ARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt
Chad Austin <chadaustin(a)fb.com>
fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS
Alek Du <alek.du(a)intel.com>
mmc: sdhci: fix the timeout check window for clock and reset
Faiz Abbas <faiz_abbas(a)ti.com>
mmc: sdhci-omap: Fix DCRC error handling during tuning
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
mmc: core: use mrq->sbc when sending CMD23 for RPMB
Aaro Koskinen <aaro.koskinen(a)iki.fi>
MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310
Amir Goldstein <amir73il(a)gmail.com>
ovl: fix missing override creds in link of a metacopy upper
Amir Goldstein <amir73il(a)gmail.com>
ovl: fix decode of dir file handle with multi lower layers
Keith Busch <keith.busch(a)intel.com>
block/bio: Do not zero user pages
Robin Murphy <robin.murphy(a)arm.com>
arm64: dma-mapping: Fix FORCE_CONTIGUOUS buffer clearing
Andrea Arcangeli <aarcange(a)redhat.com>
userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
Piotr Jaroszynski <pjaroszynski(a)nvidia.com>
fs/iomap.c: get/put the page in iomap_page_create/release()
Thierry Reding <treding(a)nvidia.com>
scripts/spdxcheck.py: always open files in binary mode
Jeff Moyer <jmoyer(a)redhat.com>
aio: fix spectre gadget in lookup_ioctx
Chen-Yu Tsai <wens(a)csie.org>
pinctrl: sunxi: a83t: Fix IRQ offset typo for PH11
Arnd Bergmann <arnd(a)arndb.de>
drm/msm: fix address space warning
Arnd Bergmann <arnd(a)arndb.de>
ARM: dts: qcom-apq8064-arrow-sd-600eval fix graph_endpoint warning
Arnd Bergmann <arnd(a)arndb.de>
i2c: aspeed: fix build warning
Arnd Bergmann <arnd(a)arndb.de>
slimbus: ngd: mark PM functions as __maybe_unused
Lubomir Rintel <lkundrak(a)v3.sk>
staging: olpc_dcon: add a missing dependency
Arnd Bergmann <arnd(a)arndb.de>
scsi: raid_attrs: fix unused variable warning
Vincent Guittot <vincent.guittot(a)linaro.org>
sched/pelt: Fix warning and clean up IRQ PELT config
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/bcm2837-rpi-3-b-plus.dts | 2 +-
arch/arm/boot/dts/bcm2837-rpi-3-b.dts | 2 +-
.../arm/boot/dts/qcom-apq8064-arrow-sd-600eval.dts | 5 +
arch/arm/mach-mmp/cputype.h | 6 +-
arch/arm64/mm/dma-mapping.c | 2 +-
arch/powerpc/kernel/legacy_serial.c | 6 +-
arch/powerpc/kernel/msi.c | 7 +-
arch/x86/Makefile | 10 +-
block/bio.c | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c | 36 +++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 +
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 +
drivers/gpu/drm/amd/powerplay/inc/smu7_ppsmc.h | 2 +
.../drm/amd/powerplay/smumgr/polaris10_smumgr.c | 6 +
drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c | 3 +
drivers/gpu/drm/i915/gvt/fb_decoder.c | 2 +-
drivers/gpu/drm/i915/intel_lrc.c | 7 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_dbg.c | 8 +-
drivers/gpu/drm/nouveau/dispnv50/disp.c | 30 +++--
drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 6 -
drivers/i2c/busses/i2c-aspeed.c | 4 +-
drivers/md/dm-cache-metadata.c | 4 +
drivers/md/dm-thin.c | 68 ++++++------
drivers/md/dm-zoned-target.c | 122 +++++++--------------
drivers/md/dm.c | 2 +
drivers/media/common/videobuf2/videobuf2-core.c | 4 +-
drivers/mmc/core/block.c | 15 ++-
drivers/mmc/host/omap.c | 11 +-
drivers/mmc/host/sdhci-omap.c | 12 +-
drivers/mmc/host/sdhci.c | 18 ++-
drivers/pinctrl/sunxi/pinctrl-sun8i-a83t.c | 2 +-
drivers/scsi/raid_class.c | 4 +-
drivers/slimbus/qcom-ngd-ctrl.c | 6 +-
drivers/staging/olpc_dcon/Kconfig | 1 +
fs/aio.c | 2 +
fs/fuse/dir.c | 2 +-
fs/fuse/file.c | 21 ++--
fs/fuse/fuse_i.h | 2 +-
fs/iomap.c | 7 ++
fs/overlayfs/dir.c | 14 ++-
fs/overlayfs/export.c | 6 +-
fs/userfaultfd.c | 3 +-
init/Kconfig | 5 +
kernel/sched/core.c | 7 +-
kernel/sched/fair.c | 2 +-
kernel/sched/pelt.c | 2 +-
kernel/sched/pelt.h | 2 +-
kernel/sched/sched.h | 5 +-
kernel/trace/ftrace.c | 1 +
kernel/trace/trace_events_filter.c | 5 +-
kernel/trace/trace_events_trigger.c | 6 +-
scripts/spdxcheck.py | 6 +-
53 files changed, 311 insertions(+), 219 deletions(-)
This is a note to let you know that I've just added the patch titled
usb: r8a66597: Fix a possible concurrency use-after-free bug in
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From c85400f886e3d41e69966470879f635a2b50084c Mon Sep 17 00:00:00 2001
From: Jia-Ju Bai <baijiaju1990(a)gmail.com>
Date: Tue, 18 Dec 2018 20:04:25 +0800
Subject: usb: r8a66597: Fix a possible concurrency use-after-free bug in
r8a66597_endpoint_disable()
The function r8a66597_endpoint_disable() and r8a66597_urb_enqueue() may
be concurrently executed.
The two functions both access a possible shared variable "hep->hcpriv".
This shared variable is freed by r8a66597_endpoint_disable() via the
call path:
r8a66597_endpoint_disable
kfree(hep->hcpriv) (line 1995 in Linux-4.19)
This variable is read by r8a66597_urb_enqueue() via the call path:
r8a66597_urb_enqueue
spin_lock_irqsave(&r8a66597->lock)
init_pipe_info
enable_r8a66597_pipe
pipe = hep->hcpriv (line 802 in Linux-4.19)
The read operation is protected by a spinlock, but the free operation
is not protected by this spinlock, thus a concurrency use-after-free bug
may occur.
To fix this bug, the spin-lock and spin-unlock function calls in
r8a66597_endpoint_disable() are moved to protect the free operation.
Signed-off-by: Jia-Ju Bai <baijiaju1990(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/r8a66597-hcd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/r8a66597-hcd.c b/drivers/usb/host/r8a66597-hcd.c
index 984892dd72f5..42668aeca57c 100644
--- a/drivers/usb/host/r8a66597-hcd.c
+++ b/drivers/usb/host/r8a66597-hcd.c
@@ -1979,6 +1979,8 @@ static int r8a66597_urb_dequeue(struct usb_hcd *hcd, struct urb *urb,
static void r8a66597_endpoint_disable(struct usb_hcd *hcd,
struct usb_host_endpoint *hep)
+__acquires(r8a66597->lock)
+__releases(r8a66597->lock)
{
struct r8a66597 *r8a66597 = hcd_to_r8a66597(hcd);
struct r8a66597_pipe *pipe = (struct r8a66597_pipe *)hep->hcpriv;
@@ -1991,13 +1993,14 @@ static void r8a66597_endpoint_disable(struct usb_hcd *hcd,
return;
pipenum = pipe->info.pipenum;
+ spin_lock_irqsave(&r8a66597->lock, flags);
if (pipenum == 0) {
kfree(hep->hcpriv);
hep->hcpriv = NULL;
+ spin_unlock_irqrestore(&r8a66597->lock, flags);
return;
}
- spin_lock_irqsave(&r8a66597->lock, flags);
pipe_stop(r8a66597, pipe);
pipe_irq_disable(r8a66597, pipenum);
disable_irq_empty(r8a66597, pipenum);
--
2.20.1
This is a note to let you know that I've just added the patch titled
driver core: Add missing dev->bus->need_parent_lock checks
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From e121a833745b4708b660e3fe6776129c2956b041 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Thu, 13 Dec 2018 19:27:47 +0100
Subject: driver core: Add missing dev->bus->need_parent_lock checks
__device_release_driver() has to check dev->bus->need_parent_lock
before dropping the parent lock and acquiring it again as it may
attempt to drop a lock that hasn't been acquired or lock a device
that shouldn't be locked and create a lock imbalance.
Fixes: 8c97a46af04b (driver core: hold dev's parent lock when needed)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/dd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 88713f182086..8ac10af17c00 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -933,11 +933,11 @@ static void __device_release_driver(struct device *dev, struct device *parent)
if (drv) {
while (device_links_busy(dev)) {
device_unlock(dev);
- if (parent)
+ if (parent && dev->bus->need_parent_lock)
device_unlock(parent);
device_links_unbind_consumers(dev);
- if (parent)
+ if (parent && dev->bus->need_parent_lock)
device_lock(parent);
device_lock(dev);
--
2.20.1
Hi Marc,
This is wrong: commit 6022fcc0e87a0eb5e9a72b15ed70dd29ebcb7343
The above is not my original patch and it should not be tagged for stable,
as it introduces the same kind of bug I intended to fix:
array_index_nospec() can now return kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS
and this is not what you want. So, in this case the following line of code
is just fine as it is:
intid = array_index_nospec(intid, kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS);
As the commit log says, my patch fixes:
commit 41b87599c74300027f305d7b34368ec558978ff2
not both:
commit 41b87599c74300027f305d7b34368ec558978ff2
and
commit bea2ef803ade3359026d5d357348842bca9edcf1
If you want to apply the fix on top of bea2ef803ade3359026d5d357348842bca9edcf1
then you should apply this instead:
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index bb1a83345741..e607547c7bb0 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -103,7 +103,7 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
{
/* SGIs and PPIs */
if (intid <= VGIC_MAX_PRIVATE) {
- intid = array_index_nospec(intid, VGIC_MAX_PRIVATE);
+ intid = array_index_nospec(intid, VGIC_MAX_PRIVATE + 1);
return &vcpu->arch.vgic_cpu.private_irqs[intid];
}
The commit log should remain the same.
Thanks
--
Gustavo
As part of my work for the Civil Infrastructure Platform, I've been
tracking security issues in the kernel and trying to ensure that the
fixes are applied to stable branches as necessary.
The "kernel-sec" repository at
<https://gitlab.com/cip-project/cip-kernel/cip-kernel-sec> contains
information about known issues and scripts to aid in maintaining and
viewing that information. Issues are identified by CVE ID and their
status is recorded for mainline and all live stable branches.
I import most of the information from distribution security trackers,
and from upstream commit references in stable branch commit messages.
Manual editing is needed mostly to correct errors in these sources, or
where the commits fixing an issue in a stable branch don't correspond
exactly to the commits fixing it in mainline.
I recently added a local web application that allows browsing the
status of all branches and issues, complete with links to references
and related commits. There is also a simple reporting script that
lists open issues for each branch.
If you're interested in security support for stable branches, please
take a look at this.
I would welcome merge requests to add to the issue data or to improve
the scripts.
Ben.
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
This is a note to let you know that I've just added the patch titled
binder: fix use-after-free due to ksys_close() during fdget()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 80cd795630d6526ba729a089a435bf74a57af927 Mon Sep 17 00:00:00 2001
From: Todd Kjos <tkjos(a)android.com>
Date: Fri, 14 Dec 2018 15:58:21 -0800
Subject: binder: fix use-after-free due to ksys_close() during fdget()
44d8047f1d8 ("binder: use standard functions to allocate fds")
exposed a pre-existing issue in the binder driver.
fdget() is used in ksys_ioctl() as a performance optimization.
One of the rules associated with fdget() is that ksys_close() must
not be called between the fdget() and the fdput(). There is a case
where this requirement is not met in the binder driver which results
in the reference count dropping to 0 when the device is still in
use. This can result in use-after-free or other issues.
If userpace has passed a file-descriptor for the binder driver using
a BINDER_TYPE_FDA object, then kys_close() is called on it when
handling a binder_ioctl(BC_FREE_BUFFER) command. This violates
the assumptions for using fdget().
The problem is fixed by deferring the close using task_work_add(). A
new variant of __close_fd() was created that returns a struct file
with a reference. The fput() is deferred instead of using ksys_close().
Fixes: 44d8047f1d87a ("binder: use standard functions to allocate fds")
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Todd Kjos <tkjos(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/android/binder.c | 63 ++++++++++++++++++++++++++++++++++++++--
fs/file.c | 29 ++++++++++++++++++
include/linux/fdtable.h | 1 +
3 files changed, 91 insertions(+), 2 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d653e8a474fc..210940bd0457 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -72,6 +72,7 @@
#include <linux/spinlock.h>
#include <linux/ratelimit.h>
#include <linux/syscalls.h>
+#include <linux/task_work.h>
#include <uapi/linux/android/binder.h>
@@ -2170,6 +2171,64 @@ static bool binder_validate_fixup(struct binder_buffer *b,
return (fixup_offset >= last_min_offset);
}
+/**
+ * struct binder_task_work_cb - for deferred close
+ *
+ * @twork: callback_head for task work
+ * @fd: fd to close
+ *
+ * Structure to pass task work to be handled after
+ * returning from binder_ioctl() via task_work_add().
+ */
+struct binder_task_work_cb {
+ struct callback_head twork;
+ struct file *file;
+};
+
+/**
+ * binder_do_fd_close() - close list of file descriptors
+ * @twork: callback head for task work
+ *
+ * It is not safe to call ksys_close() during the binder_ioctl()
+ * function if there is a chance that binder's own file descriptor
+ * might be closed. This is to meet the requirements for using
+ * fdget() (see comments for __fget_light()). Therefore use
+ * task_work_add() to schedule the close operation once we have
+ * returned from binder_ioctl(). This function is a callback
+ * for that mechanism and does the actual ksys_close() on the
+ * given file descriptor.
+ */
+static void binder_do_fd_close(struct callback_head *twork)
+{
+ struct binder_task_work_cb *twcb = container_of(twork,
+ struct binder_task_work_cb, twork);
+
+ fput(twcb->file);
+ kfree(twcb);
+}
+
+/**
+ * binder_deferred_fd_close() - schedule a close for the given file-descriptor
+ * @fd: file-descriptor to close
+ *
+ * See comments in binder_do_fd_close(). This function is used to schedule
+ * a file-descriptor to be closed after returning from binder_ioctl().
+ */
+static void binder_deferred_fd_close(int fd)
+{
+ struct binder_task_work_cb *twcb;
+
+ twcb = kzalloc(sizeof(*twcb), GFP_KERNEL);
+ if (!twcb)
+ return;
+ init_task_work(&twcb->twork, binder_do_fd_close);
+ __close_fd_get_file(fd, &twcb->file);
+ if (twcb->file)
+ task_work_add(current, &twcb->twork, true);
+ else
+ kfree(twcb);
+}
+
static void binder_transaction_buffer_release(struct binder_proc *proc,
struct binder_buffer *buffer,
binder_size_t *failed_at)
@@ -2309,7 +2368,7 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
}
fd_array = (u32 *)(parent_buffer + (uintptr_t)fda->parent_offset);
for (fd_index = 0; fd_index < fda->num_fds; fd_index++)
- ksys_close(fd_array[fd_index]);
+ binder_deferred_fd_close(fd_array[fd_index]);
} break;
default:
pr_err("transaction release %d bad object type %x\n",
@@ -3928,7 +3987,7 @@ static int binder_apply_fd_fixups(struct binder_transaction *t)
} else if (ret) {
u32 *fdp = (u32 *)(t->buffer->data + fixup->offset);
- ksys_close(*fdp);
+ binder_deferred_fd_close(*fdp);
}
list_del(&fixup->fixup_entry);
kfree(fixup);
diff --git a/fs/file.c b/fs/file.c
index 7ffd6e9d103d..8d059d8973e9 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -640,6 +640,35 @@ int __close_fd(struct files_struct *files, unsigned fd)
}
EXPORT_SYMBOL(__close_fd); /* for ksys_close() */
+/*
+ * variant of __close_fd that gets a ref on the file for later fput
+ */
+int __close_fd_get_file(unsigned int fd, struct file **res)
+{
+ struct files_struct *files = current->files;
+ struct file *file;
+ struct fdtable *fdt;
+
+ spin_lock(&files->file_lock);
+ fdt = files_fdtable(files);
+ if (fd >= fdt->max_fds)
+ goto out_unlock;
+ file = fdt->fd[fd];
+ if (!file)
+ goto out_unlock;
+ rcu_assign_pointer(fdt->fd[fd], NULL);
+ __put_unused_fd(files, fd);
+ spin_unlock(&files->file_lock);
+ get_file(file);
+ *res = file;
+ return filp_close(file, files);
+
+out_unlock:
+ spin_unlock(&files->file_lock);
+ *res = NULL;
+ return -ENOENT;
+}
+
void do_close_on_exec(struct files_struct *files)
{
unsigned i;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 41615f38bcff..f07c55ea0c22 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -121,6 +121,7 @@ extern void __fd_install(struct files_struct *files,
unsigned int fd, struct file *file);
extern int __close_fd(struct files_struct *files,
unsigned int fd);
+extern int __close_fd_get_file(unsigned int fd, struct file **res);
extern struct kmem_cache *files_cachep;
--
2.20.1
On Tue, 18 Dec 2018 at 21:41, Sasha Levin <sashal(a)kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146, v4.4.168, v3.18.130,
>
Please disregard this patch for -stable until we decide how we are
going to fix the 32-bit array packing issue.
> v4.19.10: Build OK!
> v4.14.89: Build OK!
> v4.9.146: Failed to apply! Possible dependencies:
> 2f74f09bce4f ("efi: parse ARM processor error")
> 5b53696a30d5 ("ACPI / APEI: Switch to use new generic UUID API")
> bbcc2e7b642e ("ras: acpi/apei: cper: add support for generic data v3 structure")
> c0020756315e ("efi: switch to use new generic UUID API")
>
> v4.4.168: Failed to apply! Possible dependencies:
> 2c23b73c2d02 ("x86/efi: Prepare GOP handling code for reuse as generic code")
> 2f74f09bce4f ("efi: parse ARM processor error")
> 5b53696a30d5 ("ACPI / APEI: Switch to use new generic UUID API")
> ba7e34b1bbd2 ("include/linux/efi.h: redefine type, constant, macro from generic code")
> bbcc2e7b642e ("ras: acpi/apei: cper: add support for generic data v3 structure")
> c0020756315e ("efi: switch to use new generic UUID API")
>
> v3.18.130: Failed to apply! Possible dependencies:
> 1bd0abb0c924 ("arm64/efi: set EFI_ALLOC_ALIGN to 64 KB")
> 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls")
> 2c23b73c2d02 ("x86/efi: Prepare GOP handling code for reuse as generic code")
> 2f74f09bce4f ("efi: parse ARM processor error")
> 4c62360d7562 ("efi: Handle memory error structures produced based on old versions of standard")
> 4ee20980812b ("arm64: fix data type for physical address")
> 5b53696a30d5 ("ACPI / APEI: Switch to use new generic UUID API")
> 60305db98845 ("arm64/efi: move virtmap init to early initcall")
> 744937b0b12a ("efi: Clean up the efi_call_phys_[prolog|epilog]() save/restore interaction")
> 790a2ee24278 ("Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi")
> 8a53554e12e9 ("x86/efi: Fix multiple GOP device support")
> 8ce837cee8f5 ("arm64/mm: add create_pgd_mapping() to create private page tables")
> 9679be103108 ("arm64/efi: remove idmap manipulations from UEFI code")
> a352ea3e197b ("arm64/efi: set PE/COFF file alignment to 512 bytes")
> b05b9f5f9dcf ("x86, mirror: x86 enabling - find mirrored memory ranges")
> ba7e34b1bbd2 ("include/linux/efi.h: redefine type, constant, macro from generic code")
> bbcc2e7b642e ("ras: acpi/apei: cper: add support for generic data v3 structure")
> c0020756315e ("efi: switch to use new generic UUID API")
> d1ae8c005792 ("arm64: dmi: Add SMBIOS/DMI support")
> da141706aea5 ("arm64: add better page protections to arm64")
> e1e1fddae74b ("arm64/mm: add explicit struct_mm argument to __create_mapping()")
> ea6bc80d1819 ("arm64/efi: set PE/COFF section alignment to 4 KB")
> f3cdfd239da5 ("arm64/efi: move SetVirtualAddressMap() to UEFI stub")
>
>
> How should we proceed with this patch?
>
> --
> Thanks,
> Sasha
This is a note to let you know that I've just added the patch titled
driver core: Add missing dev->bus->need_parent_lock checks
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the driver-core-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From e121a833745b4708b660e3fe6776129c2956b041 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Thu, 13 Dec 2018 19:27:47 +0100
Subject: driver core: Add missing dev->bus->need_parent_lock checks
__device_release_driver() has to check dev->bus->need_parent_lock
before dropping the parent lock and acquiring it again as it may
attempt to drop a lock that hasn't been acquired or lock a device
that shouldn't be locked and create a lock imbalance.
Fixes: 8c97a46af04b (driver core: hold dev's parent lock when needed)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/dd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 88713f182086..8ac10af17c00 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -933,11 +933,11 @@ static void __device_release_driver(struct device *dev, struct device *parent)
if (drv) {
while (device_links_busy(dev)) {
device_unlock(dev);
- if (parent)
+ if (parent && dev->bus->need_parent_lock)
device_unlock(parent);
device_links_unbind_consumers(dev);
- if (parent)
+ if (parent && dev->bus->need_parent_lock)
device_lock(parent);
device_lock(dev);
--
2.20.1
Hi Sasha,
> -----Original Message-----
> From: Sasha Levin [mailto:sashal@kernel.org]
> Sent: Wednesday, December 19, 2018 4:25 AM
> To: Sasha Levin <sashal(a)kernel.org>; Daniel Lezcano <daniel.lezcano(a)linaro.org>; Alexey Brodkin <alexey.brodkin(a)synopsys.com>;
> tglx(a)linutronix.de
> Cc: linux-kernel(a)vger.kernel.org; Daniel Lezcano <daniel.lezcano(a)linaro.org>; Vineet Gupta <vineet.gupta1(a)synopsys.com>;
> Thomas Gleixner <tglx(a)linutronix.de>; stable(a)vger.kernel.org; stable(a)vger.kernel.org
> Subject: Re: [PATCH 12/25] clocksource/drivers/arc_timer: Utilize generic sched_clock
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146, v4.4.168, v3.18.130,
>
> v4.19.10: Build OK!
> v4.14.89: Failed to apply! Possible dependencies:
> Unable to calculate
Here we just need a bit updated hunk due to missing [1] which was only introduced in v4.15:
-------------------------->8------------------------
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -299,6 +299,7 @@ config CLKSRC_MPS2
config ARC_TIMERS
bool "Support for 32-bit TIMERn counters in ARC Cores" if COMPILE_TEST
depends on GENERIC_CLOCKEVENTS
+ depends on GENERIC_SCHED_CLOCK
select TIMER_OF
help
These are legacy 32-bit TIMER0 and TIMER1 counters found on all ARC cores
-------------------------->8------------------------
> v4.9.146: Failed to apply! Possible dependencies:
> v4.4.168: Failed to apply! Possible dependencies:
> v3.18.130: Failed to apply! Possible dependencies:
Everything below v4.10 we'll need to drop as ARC timers were only imported in v4.10, see [2].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
-Alexey
This is a note to let you know that I've just added the patch titled
binder: fix use-after-free due to ksys_close() during fdget()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 80cd795630d6526ba729a089a435bf74a57af927 Mon Sep 17 00:00:00 2001
From: Todd Kjos <tkjos(a)android.com>
Date: Fri, 14 Dec 2018 15:58:21 -0800
Subject: binder: fix use-after-free due to ksys_close() during fdget()
44d8047f1d8 ("binder: use standard functions to allocate fds")
exposed a pre-existing issue in the binder driver.
fdget() is used in ksys_ioctl() as a performance optimization.
One of the rules associated with fdget() is that ksys_close() must
not be called between the fdget() and the fdput(). There is a case
where this requirement is not met in the binder driver which results
in the reference count dropping to 0 when the device is still in
use. This can result in use-after-free or other issues.
If userpace has passed a file-descriptor for the binder driver using
a BINDER_TYPE_FDA object, then kys_close() is called on it when
handling a binder_ioctl(BC_FREE_BUFFER) command. This violates
the assumptions for using fdget().
The problem is fixed by deferring the close using task_work_add(). A
new variant of __close_fd() was created that returns a struct file
with a reference. The fput() is deferred instead of using ksys_close().
Fixes: 44d8047f1d87a ("binder: use standard functions to allocate fds")
Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Todd Kjos <tkjos(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/android/binder.c | 63 ++++++++++++++++++++++++++++++++++++++--
fs/file.c | 29 ++++++++++++++++++
include/linux/fdtable.h | 1 +
3 files changed, 91 insertions(+), 2 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d653e8a474fc..210940bd0457 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -72,6 +72,7 @@
#include <linux/spinlock.h>
#include <linux/ratelimit.h>
#include <linux/syscalls.h>
+#include <linux/task_work.h>
#include <uapi/linux/android/binder.h>
@@ -2170,6 +2171,64 @@ static bool binder_validate_fixup(struct binder_buffer *b,
return (fixup_offset >= last_min_offset);
}
+/**
+ * struct binder_task_work_cb - for deferred close
+ *
+ * @twork: callback_head for task work
+ * @fd: fd to close
+ *
+ * Structure to pass task work to be handled after
+ * returning from binder_ioctl() via task_work_add().
+ */
+struct binder_task_work_cb {
+ struct callback_head twork;
+ struct file *file;
+};
+
+/**
+ * binder_do_fd_close() - close list of file descriptors
+ * @twork: callback head for task work
+ *
+ * It is not safe to call ksys_close() during the binder_ioctl()
+ * function if there is a chance that binder's own file descriptor
+ * might be closed. This is to meet the requirements for using
+ * fdget() (see comments for __fget_light()). Therefore use
+ * task_work_add() to schedule the close operation once we have
+ * returned from binder_ioctl(). This function is a callback
+ * for that mechanism and does the actual ksys_close() on the
+ * given file descriptor.
+ */
+static void binder_do_fd_close(struct callback_head *twork)
+{
+ struct binder_task_work_cb *twcb = container_of(twork,
+ struct binder_task_work_cb, twork);
+
+ fput(twcb->file);
+ kfree(twcb);
+}
+
+/**
+ * binder_deferred_fd_close() - schedule a close for the given file-descriptor
+ * @fd: file-descriptor to close
+ *
+ * See comments in binder_do_fd_close(). This function is used to schedule
+ * a file-descriptor to be closed after returning from binder_ioctl().
+ */
+static void binder_deferred_fd_close(int fd)
+{
+ struct binder_task_work_cb *twcb;
+
+ twcb = kzalloc(sizeof(*twcb), GFP_KERNEL);
+ if (!twcb)
+ return;
+ init_task_work(&twcb->twork, binder_do_fd_close);
+ __close_fd_get_file(fd, &twcb->file);
+ if (twcb->file)
+ task_work_add(current, &twcb->twork, true);
+ else
+ kfree(twcb);
+}
+
static void binder_transaction_buffer_release(struct binder_proc *proc,
struct binder_buffer *buffer,
binder_size_t *failed_at)
@@ -2309,7 +2368,7 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
}
fd_array = (u32 *)(parent_buffer + (uintptr_t)fda->parent_offset);
for (fd_index = 0; fd_index < fda->num_fds; fd_index++)
- ksys_close(fd_array[fd_index]);
+ binder_deferred_fd_close(fd_array[fd_index]);
} break;
default:
pr_err("transaction release %d bad object type %x\n",
@@ -3928,7 +3987,7 @@ static int binder_apply_fd_fixups(struct binder_transaction *t)
} else if (ret) {
u32 *fdp = (u32 *)(t->buffer->data + fixup->offset);
- ksys_close(*fdp);
+ binder_deferred_fd_close(*fdp);
}
list_del(&fixup->fixup_entry);
kfree(fixup);
diff --git a/fs/file.c b/fs/file.c
index 7ffd6e9d103d..8d059d8973e9 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -640,6 +640,35 @@ int __close_fd(struct files_struct *files, unsigned fd)
}
EXPORT_SYMBOL(__close_fd); /* for ksys_close() */
+/*
+ * variant of __close_fd that gets a ref on the file for later fput
+ */
+int __close_fd_get_file(unsigned int fd, struct file **res)
+{
+ struct files_struct *files = current->files;
+ struct file *file;
+ struct fdtable *fdt;
+
+ spin_lock(&files->file_lock);
+ fdt = files_fdtable(files);
+ if (fd >= fdt->max_fds)
+ goto out_unlock;
+ file = fdt->fd[fd];
+ if (!file)
+ goto out_unlock;
+ rcu_assign_pointer(fdt->fd[fd], NULL);
+ __put_unused_fd(files, fd);
+ spin_unlock(&files->file_lock);
+ get_file(file);
+ *res = file;
+ return filp_close(file, files);
+
+out_unlock:
+ spin_unlock(&files->file_lock);
+ *res = NULL;
+ return -ENOENT;
+}
+
void do_close_on_exec(struct files_struct *files)
{
unsigned i;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 41615f38bcff..f07c55ea0c22 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -121,6 +121,7 @@ extern void __fd_install(struct files_struct *files,
unsigned int fd, struct file *file);
extern int __close_fd(struct files_struct *files,
unsigned int fd);
+extern int __close_fd_get_file(unsigned int fd, struct file **res);
extern struct kmem_cache *files_cachep;
--
2.20.1
In commit bc73905abf770192 ("[SCSI] lpfc 8.3.16: SLI Additions, updates,
and code cleanup"), lpfc_memcpy_to_slim() have switched memcpy_toio() to
__write32_copy() in order to prevent unaligned 64 bit copy. Recently, we
found that lpfc_memcpy_from_slim() have similar issues, so let it switch
memcpy_fromio() to __read32_copy().
As maintainer says, it seems that we can hardly see a real "unaligned 64
bit copy", but this patch is still useful. Because in our tests we found
that lpfc doesn't support 128 bit access, but some optimized memcpy()
use 128 bit access (at lease on Loongson).
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhc(a)lemote.com>
---
V2: Update commit message.
drivers/scsi/lpfc/lpfc_compat.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_compat.h b/drivers/scsi/lpfc/lpfc_compat.h
index 43cf46a..0cd1e3c 100644
--- a/drivers/scsi/lpfc/lpfc_compat.h
+++ b/drivers/scsi/lpfc/lpfc_compat.h
@@ -91,8 +91,8 @@ lpfc_memcpy_to_slim( void __iomem *dest, void *src, unsigned int bytes)
static inline void
lpfc_memcpy_from_slim( void *dest, void __iomem *src, unsigned int bytes)
{
- /* actually returns 1 byte past dest */
- memcpy_fromio( dest, src, bytes);
+ /* convert bytes in argument list to word count for copy function */
+ __ioread32_copy(dest, src, bytes / sizeof(uint32_t));
}
#endif /* __BIG_ENDIAN */
--
2.7.0
This is a note to let you know that I've just added the patch titled
serial: uartps: Fix interrupt mask issue to handle the RX interrupts
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 260683137ab5276113fc322fdbbc578024185fee Mon Sep 17 00:00:00 2001
From: Nava kishore Manne <nava.manne(a)xilinx.com>
Date: Tue, 18 Dec 2018 13:18:42 +0100
Subject: serial: uartps: Fix interrupt mask issue to handle the RX interrupts
properly
This patch Correct the RX interrupt mask value to handle the
RX interrupts properly.
Fixes: c8dbdc842d30 ("serial: xuartps: Rewrite the interrupt handling logic")
Signed-off-by: Nava kishore Manne <nava.manne(a)xilinx.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Michal Simek <michal.simek(a)xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/xilinx_uartps.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c
index c6d38617d622..094f2958cb2b 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -123,7 +123,7 @@ MODULE_PARM_DESC(rx_timeout, "Rx timeout, 1-255");
#define CDNS_UART_IXR_RXTRIG 0x00000001 /* RX FIFO trigger interrupt */
#define CDNS_UART_IXR_RXFULL 0x00000004 /* RX FIFO full interrupt. */
#define CDNS_UART_IXR_RXEMPTY 0x00000002 /* RX FIFO empty interrupt. */
-#define CDNS_UART_IXR_MASK 0x00001FFF /* Valid bit mask */
+#define CDNS_UART_IXR_RXMASK 0x000021e7 /* Valid RX bit mask */
/*
* Do not enable parity error interrupt for the following
@@ -364,7 +364,7 @@ static irqreturn_t cdns_uart_isr(int irq, void *dev_id)
cdns_uart_handle_tx(dev_id);
isrstatus &= ~CDNS_UART_IXR_TXEMPTY;
}
- if (isrstatus & CDNS_UART_IXR_MASK)
+ if (isrstatus & CDNS_UART_IXR_RXMASK)
cdns_uart_handle_rx(dev_id, isrstatus);
spin_unlock(&port->lock);
--
2.20.1
This is a note to let you know that I've just added the patch titled
usb: r8a66597: Fix a possible concurrency use-after-free bug in
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From c85400f886e3d41e69966470879f635a2b50084c Mon Sep 17 00:00:00 2001
From: Jia-Ju Bai <baijiaju1990(a)gmail.com>
Date: Tue, 18 Dec 2018 20:04:25 +0800
Subject: usb: r8a66597: Fix a possible concurrency use-after-free bug in
r8a66597_endpoint_disable()
The function r8a66597_endpoint_disable() and r8a66597_urb_enqueue() may
be concurrently executed.
The two functions both access a possible shared variable "hep->hcpriv".
This shared variable is freed by r8a66597_endpoint_disable() via the
call path:
r8a66597_endpoint_disable
kfree(hep->hcpriv) (line 1995 in Linux-4.19)
This variable is read by r8a66597_urb_enqueue() via the call path:
r8a66597_urb_enqueue
spin_lock_irqsave(&r8a66597->lock)
init_pipe_info
enable_r8a66597_pipe
pipe = hep->hcpriv (line 802 in Linux-4.19)
The read operation is protected by a spinlock, but the free operation
is not protected by this spinlock, thus a concurrency use-after-free bug
may occur.
To fix this bug, the spin-lock and spin-unlock function calls in
r8a66597_endpoint_disable() are moved to protect the free operation.
Signed-off-by: Jia-Ju Bai <baijiaju1990(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/r8a66597-hcd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/r8a66597-hcd.c b/drivers/usb/host/r8a66597-hcd.c
index 984892dd72f5..42668aeca57c 100644
--- a/drivers/usb/host/r8a66597-hcd.c
+++ b/drivers/usb/host/r8a66597-hcd.c
@@ -1979,6 +1979,8 @@ static int r8a66597_urb_dequeue(struct usb_hcd *hcd, struct urb *urb,
static void r8a66597_endpoint_disable(struct usb_hcd *hcd,
struct usb_host_endpoint *hep)
+__acquires(r8a66597->lock)
+__releases(r8a66597->lock)
{
struct r8a66597 *r8a66597 = hcd_to_r8a66597(hcd);
struct r8a66597_pipe *pipe = (struct r8a66597_pipe *)hep->hcpriv;
@@ -1991,13 +1993,14 @@ static void r8a66597_endpoint_disable(struct usb_hcd *hcd,
return;
pipenum = pipe->info.pipenum;
+ spin_lock_irqsave(&r8a66597->lock, flags);
if (pipenum == 0) {
kfree(hep->hcpriv);
hep->hcpriv = NULL;
+ spin_unlock_irqrestore(&r8a66597->lock, flags);
return;
}
- spin_lock_irqsave(&r8a66597->lock, flags);
pipe_stop(r8a66597, pipe);
pipe_irq_disable(r8a66597, pipenum);
disable_irq_empty(r8a66597, pipenum);
--
2.20.1
On Mon, Oct 15, 2018 at 06:54:31AM -0700, Omer Tripp wrote:
> Hi Greg and all,
>
> Here is my analysis of the complete gadget, and looking forward to your
> corrections/feedback if there are any inaccuracies:
>
>
> 1.
>
> __close_fd() is reachable via the close() syscall with a user-controlled
> fd.
> 2.
>
> If said bounds check is mispredicted, then a user-controlled address
> fdt->fd[fd] is obtained then dereferenced, and the value of a
> user-controlled address is loaded into the local variable file.
> 3.
>
> file is then passed as an argument to filp_close, where the cache
> lines secret
> + offsetof(f_op) and secret + offsetof(f_mode) are hot and vulnerable to
> a timing channel attack.
>
>
> The mitigation proposed by Greg Hackmann blocks this gadget.
What ever happened to this patch? Did it get reposted? If not, can
someone please do so with this text in the changelog?
thanks,
greg k-h
At least old Xen net backends seem to send frags with no real data
sometimes. In case such a fragment happens to occur with the frag limit
already reached the frontend will BUG currently even if this situation
is easily recoverable.
Modify the BUG_ON() condition accordingly.
Cc: stable(a)vger.kernel.org
Tested-by: Dietmar Hahn <dietmar.hahn(a)ts.fujitsu.com>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
---
drivers/net/xen-netfront.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index f17f602e6171..5b97cc946d70 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -905,7 +905,7 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) {
unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
- BUG_ON(pull_to <= skb_headlen(skb));
+ BUG_ON(pull_to < skb_headlen(skb));
__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
}
if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)) {
--
2.16.4
commit d57f9da890696af1484f4a47f7f123560197865a upstream.
struct bioctx includes the ref refcount_t to track the number of I/O
fragments used to process a target BIO as well as ensure that the zone
of the BIO is kept in the active state throughout the lifetime of the
BIO. However, since decrementing of this reference count is done in the
target .end_io method, the function bio_endio() must be called multiple
times for read and write target BIOs, which causes problems with the
value of the __bi_remaining struct bio field for chained BIOs (e.g. the
clone BIO passed by dm core is large and splits into fragments by the
block layer), resulting in incorrect values and inconsistencies with the
BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call:
BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);
in bio_remaining_done() called from bio_endio().
Fix this ensuring that bio_endio() is called only once for any target
BIO by always using internal clone BIOs for processing any read or
write target BIO. This allows reference counting using the target BIO
context counter to trigger the target BIO completion bio_endio() call
once all data, metadata and other zone work triggered by the BIO
complete.
Overall, this simplifies the code too as the target .end_io becomes
unnecessary and differences between read and write BIO issuing and
completion processing disappear.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable(a)vger.kernel.org #4.14
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
---
drivers/md/dm-zoned-target.c | 122 +++++++++++------------------------
1 file changed, 38 insertions(+), 84 deletions(-)
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index ba6b0a90ecfb..532bfce7f072 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -20,7 +20,6 @@ struct dmz_bioctx {
struct dm_zone *zone;
struct bio *bio;
atomic_t ref;
- blk_status_t status;
};
/*
@@ -78,65 +77,66 @@ static inline void dmz_bio_endio(struct bio *bio, blk_status_t status)
{
struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
- if (bioctx->status == BLK_STS_OK && status != BLK_STS_OK)
- bioctx->status = status;
- bio_endio(bio);
+ if (status != BLK_STS_OK && bio->bi_status == BLK_STS_OK)
+ bio->bi_status = status;
+
+ if (atomic_dec_and_test(&bioctx->ref)) {
+ struct dm_zone *zone = bioctx->zone;
+
+ if (zone) {
+ if (bio->bi_status != BLK_STS_OK &&
+ bio_op(bio) == REQ_OP_WRITE &&
+ dmz_is_seq(zone))
+ set_bit(DMZ_SEQ_WRITE_ERR, &zone->flags);
+ dmz_deactivate_zone(zone);
+ }
+ bio_endio(bio);
+ }
}
/*
- * Partial clone read BIO completion callback. This terminates the
+ * Completion callback for an internally cloned target BIO. This terminates the
* target BIO when there are no more references to its context.
*/
-static void dmz_read_bio_end_io(struct bio *bio)
+static void dmz_clone_endio(struct bio *clone)
{
- struct dmz_bioctx *bioctx = bio->bi_private;
- blk_status_t status = bio->bi_status;
+ struct dmz_bioctx *bioctx = clone->bi_private;
+ blk_status_t status = clone->bi_status;
- bio_put(bio);
+ bio_put(clone);
dmz_bio_endio(bioctx->bio, status);
}
/*
- * Issue a BIO to a zone. The BIO may only partially process the
+ * Issue a clone of a target BIO. The clone may only partially process the
* original target BIO.
*/
-static int dmz_submit_read_bio(struct dmz_target *dmz, struct dm_zone *zone,
- struct bio *bio, sector_t chunk_block,
- unsigned int nr_blocks)
+static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
+ struct bio *bio, sector_t chunk_block,
+ unsigned int nr_blocks)
{
struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
- sector_t sector;
struct bio *clone;
- /* BIO remap sector */
- sector = dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
-
- /* If the read is not partial, there is no need to clone the BIO */
- if (nr_blocks == dmz_bio_blocks(bio)) {
- /* Setup and submit the BIO */
- bio->bi_iter.bi_sector = sector;
- atomic_inc(&bioctx->ref);
- generic_make_request(bio);
- return 0;
- }
-
- /* Partial BIO: we need to clone the BIO */
clone = bio_clone_fast(bio, GFP_NOIO, dmz->bio_set);
if (!clone)
return -ENOMEM;
- /* Setup the clone */
- clone->bi_iter.bi_sector = sector;
+ bio_set_dev(clone, dmz->dev->bdev);
+ clone->bi_iter.bi_sector =
+ dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
clone->bi_iter.bi_size = dmz_blk2sect(nr_blocks) << SECTOR_SHIFT;
- clone->bi_end_io = dmz_read_bio_end_io;
+ clone->bi_end_io = dmz_clone_endio;
clone->bi_private = bioctx;
bio_advance(bio, clone->bi_iter.bi_size);
- /* Submit the clone */
atomic_inc(&bioctx->ref);
generic_make_request(clone);
+ if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
+ zone->wp_block += nr_blocks;
+
return 0;
}
@@ -214,7 +214,7 @@ static int dmz_handle_read(struct dmz_target *dmz, struct dm_zone *zone,
if (nr_blocks) {
/* Valid blocks found: read them */
nr_blocks = min_t(unsigned int, nr_blocks, end_block - chunk_block);
- ret = dmz_submit_read_bio(dmz, rzone, bio, chunk_block, nr_blocks);
+ ret = dmz_submit_bio(dmz, rzone, bio, chunk_block, nr_blocks);
if (ret)
return ret;
chunk_block += nr_blocks;
@@ -228,25 +228,6 @@ static int dmz_handle_read(struct dmz_target *dmz, struct dm_zone *zone,
return 0;
}
-/*
- * Issue a write BIO to a zone.
- */
-static void dmz_submit_write_bio(struct dmz_target *dmz, struct dm_zone *zone,
- struct bio *bio, sector_t chunk_block,
- unsigned int nr_blocks)
-{
- struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-
- /* Setup and submit the BIO */
- bio_set_dev(bio, dmz->dev->bdev);
- bio->bi_iter.bi_sector = dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
- atomic_inc(&bioctx->ref);
- generic_make_request(bio);
-
- if (dmz_is_seq(zone))
- zone->wp_block += nr_blocks;
-}
-
/*
* Write blocks directly in a data zone, at the write pointer.
* If a buffer zone is assigned, invalidate the blocks written
@@ -265,7 +246,9 @@ static int dmz_handle_direct_write(struct dmz_target *dmz,
return -EROFS;
/* Submit write */
- dmz_submit_write_bio(dmz, zone, bio, chunk_block, nr_blocks);
+ ret = dmz_submit_bio(dmz, zone, bio, chunk_block, nr_blocks);
+ if (ret)
+ return ret;
/*
* Validate the blocks in the data zone and invalidate
@@ -301,7 +284,9 @@ static int dmz_handle_buffered_write(struct dmz_target *dmz,
return -EROFS;
/* Submit write */
- dmz_submit_write_bio(dmz, bzone, bio, chunk_block, nr_blocks);
+ ret = dmz_submit_bio(dmz, bzone, bio, chunk_block, nr_blocks);
+ if (ret)
+ return ret;
/*
* Validate the blocks in the buffer zone
@@ -600,7 +585,6 @@ static int dmz_map(struct dm_target *ti, struct bio *bio)
bioctx->zone = NULL;
bioctx->bio = bio;
atomic_set(&bioctx->ref, 1);
- bioctx->status = BLK_STS_OK;
/* Set the BIO pending in the flush list */
if (!nr_sectors && bio_op(bio) == REQ_OP_WRITE) {
@@ -623,35 +607,6 @@ static int dmz_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_SUBMITTED;
}
-/*
- * Completed target BIO processing.
- */
-static int dmz_end_io(struct dm_target *ti, struct bio *bio, blk_status_t *error)
-{
- struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-
- if (bioctx->status == BLK_STS_OK && *error)
- bioctx->status = *error;
-
- if (!atomic_dec_and_test(&bioctx->ref))
- return DM_ENDIO_INCOMPLETE;
-
- /* Done */
- bio->bi_status = bioctx->status;
-
- if (bioctx->zone) {
- struct dm_zone *zone = bioctx->zone;
-
- if (*error && bio_op(bio) == REQ_OP_WRITE) {
- if (dmz_is_seq(zone))
- set_bit(DMZ_SEQ_WRITE_ERR, &zone->flags);
- }
- dmz_deactivate_zone(zone);
- }
-
- return DM_ENDIO_DONE;
-}
-
/*
* Get zoned device information.
*/
@@ -946,7 +901,6 @@ static struct target_type dmz_type = {
.ctr = dmz_ctr,
.dtr = dmz_dtr,
.map = dmz_map,
- .end_io = dmz_end_io,
.io_hints = dmz_io_hints,
.prepare_ioctl = dmz_prepare_ioctl,
.postsuspend = dmz_suspend,
--
2.19.2
On Wed, 12 Dec 2018, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen(a)linux.intel.com>
>
> Memory protection key behavior should be the same in a child as it was
> in the parent before a fork. But, there is a bug that resets the
> state in the child at fork instead of preserving it.
>
> Our creation of new mm's is a bit convoluted. At fork(), the code
> does:
>
> 1. memcpy() the parent mm to initialize child
> 2. mm_init() to initalize some select stuff stuff
> 3. dup_mmap() to create true copies that memcpy()
> did not do right.
>
> For pkeys, we need to preserve two bits of state across a fork:
> 'execute_only_pkey' and 'pkey_allocation_map'. Those are preserved by
> the memcpy(), which I thought did the right thing. But, mm_init()
> calls init_new_context(), which I thought was *only* for execve()-time
> and overwrites 'execute_only_pkey' and 'pkey_allocation_map' with
> "new" values. But, alas, init_new_context() is used at execve() and
> fork().
>
> The result is that, after a fork(), the child's pkey state ends up
> looking like it does after an execve(), which is totally wrong. pkeys
> that are already allocated can be allocated again, for instance.
>
> To fix this, add code called by dup_mmap() to copy the pkey state from
> parent to child explicitly. Also add a comment above init_new_context()
> to make it more clear to the next poor sod what this code is used for.
>
> Fixes: e8c24d3a23a ("x86/pkeys: Allocation/free syscalls")
> Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
> Cc: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: Ingo Molnar <mingo(a)redhat.com>
> Cc: Borislav Petkov <bp(a)alien8.de>
> Cc: "H. Peter Anvin" <hpa(a)zytor.com>
> Cc: x86(a)kernel.org
> Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
> Cc: Peter Zijlstra <peterz(a)infradead.org>
> Cc: Michael Ellerman <mpe(a)ellerman.id.au>
> Cc: Will Deacon <will.deacon(a)arm.com>
> Cc: Andy Lutomirski <luto(a)kernel.org>
> Cc: Joerg Roedel <jroedel(a)suse.de>
> Cc: stable(a)vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
[ resending without broken headers ]
this is a backport of commit 7aa54be297655 ("locking/qspinlock, x86:
Provide liveness guarantee") for the v4.9 stable tree.
For the v4.4 tree the ARCH_USE_QUEUED_SPINLOCKS option got disabled on
x86.
For v4.9 it has been decided to do a minimal backport of the final fix
(including all its dependencies).
With this backport I can't reproduce the issue in the latest v4.9-RT
tree. I was able to boot (and use) an arm64 box with these patches so it
is not broken in an abvious way.
Sebastian
hugetlbfs page faults can race with truncate and hole punch operations.
Current code in the page fault path attempts to handle this by 'backing
out' operations if we encounter the race. One obvious omission in the
current code is removing a page newly added to the page cache. This is
pretty straight forward to address, but there is a more subtle and
difficult issue of backing out hugetlb reservations. To handle this
correctly, the 'reservation state' before page allocation needs to be
noted so that it can be properly backed out. There are four distinct
possibilities for reservation state: shared/reserved, shared/no-resv,
private/reserved and private/no-resv. Backing out a reservation may
require memory allocation which could fail so that needs to be taken
into account as well.
Instead of writing the required complicated code for this rare
occurrence, just eliminate the race. i_mmap_rwsem is now held in read
mode for the duration of page fault processing. Hold i_mmap_rwsem
longer in truncation and hold punch code to cover the call to
remove_inode_hugepages.
Cc: <stable(a)vger.kernel.org>
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
fs/hugetlbfs/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 32920a10100e..3244147fc42b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -505,8 +505,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
i_mmap_lock_write(mapping);
if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, offset, LLONG_MAX);
+ i_mmap_unlock_write(mapping);
return 0;
}
@@ -540,8 +540,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
hugetlb_vmdelete_list(&mapping->i_mmap,
hole_start >> PAGE_SHIFT,
hole_end >> PAGE_SHIFT);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, hole_start, hole_end);
+ i_mmap_unlock_write(mapping);
inode_unlock(inode);
}
--
2.17.2
this is a backport of commit 7aa54be297655 ("locking/qspinlock, x86:
Provide liveness guarantee") for the v4.19 stable tree.
Initially I assumed that this was merged late in v4.19-rc but actually
it is just part v4.20-rc1.
For v4.19, most things are already in the tree. The GEN_BINARY_RMWcc
macro is still "old" and I skipped the documentation update.
Sebastian
The patch titled
Subject: hugetlbfs: remove unnecessary code after i_mmap_rwsem synchronization
has been removed from the -mm tree. Its filename was
hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: remove unnecessary code after i_mmap_rwsem synchronization
After expanding i_mmap_rwsem use for better shared pmd and page fault/
truncation synchronization, remove code that is no longer necessary.
Link: http://lkml.kernel.org/r/20181203200850.6460-4-mike.kravetz@oracle.com
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 46 +++++++++++++----------------------------
mm/hugetlb.c | 21 ++++++++----------
2 files changed, 25 insertions(+), 42 deletions(-)
--- a/fs/hugetlbfs/inode.c~hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization
+++ a/fs/hugetlbfs/inode.c
@@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cac
* truncation is indicated by end of range being LLONG_MAX
* In this case, we first scan the range and release found pages.
* After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
- * maps and global counts. Page faults can not race with truncation
- * in this routine. hugetlb_no_page() prevents page faults in the
- * truncated range. It checks i_size before allocation, and again after
- * with the page table lock for the page held. The same lock must be
- * acquired to unmap a page.
+ * maps and global counts.
* hole punch is indicated if end is not LLONG_MAX
* In the hole punch case we scan the range and release found pages.
* Only when releasing a page is the associated region/reserv map
* deleted. The region/reserv map for ranges without associated
- * pages are not modified. Page faults can race with hole punch.
- * This is indicated if we find a mapped page.
+ * pages are not modified.
+ *
+ * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent
+ * races with page faults.
+ *
* Note: If the passed end of range value is beyond the end of file, but
* not LLONG_MAX this routine still performs a hole punch operation.
*/
@@ -423,32 +422,14 @@ static void remove_inode_hugepages(struc
for (i = 0; i < pagevec_count(&pvec); ++i) {
struct page *page = pvec.pages[i];
- u32 hash;
index = page->index;
- hash = hugetlb_fault_mutex_hash(h, current->mm,
- &pseudo_vma,
- mapping, index, 0);
- mutex_lock(&hugetlb_fault_mutex_table[hash]);
-
/*
- * If page is mapped, it was faulted in after being
- * unmapped in caller. Unmap (again) now after taking
- * the fault mutex. The mutex will prevent faults
- * until we finish removing the page.
- *
- * This race can only happen in the hole punch case.
- * Getting here in a truncate operation is a bug.
+ * A mapped page is impossible as callers should unmap
+ * all references before calling. And, i_mmap_rwsem
+ * prevents the creation of additional mappings.
*/
- if (unlikely(page_mapped(page))) {
- BUG_ON(truncate_op);
-
- i_mmap_lock_write(mapping);
- hugetlb_vmdelete_list(&mapping->i_mmap,
- index * pages_per_huge_page(h),
- (index + 1) * pages_per_huge_page(h));
- i_mmap_unlock_write(mapping);
- }
+ VM_BUG_ON(page_mapped(page));
lock_page(page);
/*
@@ -470,7 +451,6 @@ static void remove_inode_hugepages(struc
}
unlock_page(page);
- mutex_unlock(&hugetlb_fault_mutex_table[hash]);
}
huge_pagevec_release(&pvec);
cond_resched();
@@ -624,7 +604,11 @@ static long hugetlbfs_fallocate(struct f
/* addr is the offset within the file (zero based) */
addr = index * hpage_size;
- /* mutex taken here, fault path and hole punch */
+ /*
+ * fault mutex taken here, protects against fault path
+ * and hole punch. inode_lock previously taken protects
+ * against truncation.
+ */
hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping,
index, addr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
--- a/mm/hugetlb.c~hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization
+++ a/mm/hugetlb.c
@@ -3761,16 +3761,16 @@ static vm_fault_t hugetlb_no_page(struct
}
/*
- * Use page lock to guard against racing truncation
- * before we get page_table_lock.
+ * We can not race with truncation due to holding i_mmap_rwsem.
+ * Check once here for faults beyond end of file.
*/
+ size = i_size_read(mapping->host) >> huge_page_shift(h);
+ if (idx >= size)
+ goto out;
+
retry:
page = find_lock_page(mapping, idx);
if (!page) {
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto out;
-
/*
* Check for page in userfault range
*/
@@ -3860,9 +3860,6 @@ retry:
}
ptl = huge_pte_lock(h, mm, ptep);
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto backout;
ret = 0;
if (!huge_pte_none(huge_ptep_get(ptep)))
@@ -3965,8 +3962,10 @@ vm_fault_t hugetlb_fault(struct mm_struc
/*
* Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
- * until finished with ptep. This prevents huge_pmd_unshare from
- * being called elsewhere and making the ptep no longer valid.
+ * until finished with ptep. This serves two purposes:
+ * 1) It prevents huge_pmd_unshare from being called elsewhere
+ * and making the ptep no longer valid.
+ * 2) It synchronizes us with file truncation.
*
* ptep could have already be assigned via huge_pte_offset. That
* is OK, as huge_pte_alloc will return the same value unless
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
The patch titled
Subject: hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
has been removed from the -mm tree. Its filename was
hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs page faults can race with truncate and hole punch operations.
Current code in the page fault path attempts to handle this by 'backing
out' operations if we encounter the race. One obvious omission in the
current code is removing a page newly added to the page cache. This is
pretty straight forward to address, but there is a more subtle and
difficult issue of backing out hugetlb reservations. To handle this
correctly, the 'reservation state' before page allocation needs to be
noted so that it can be properly backed out. There are four distinct
possibilities for reservation state: shared/reserved, shared/no-resv,
private/reserved and private/no-resv. Backing out a reservation may
require memory allocation which could fail so that needs to be taken into
account as well.
Instead of writing the required complicated code for this rare occurrence,
just eliminate the race. i_mmap_rwsem is now held in read mode for the
duration of page fault processing. Hold i_mmap_rwsem longer in truncation
and hold punch code to cover the call to remove_inode_hugepages.
Link: http://lkml.kernel.org/r/20181203200850.6460-3-mike.kravetz@oracle.com
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c~hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race
+++ a/fs/hugetlbfs/inode.c
@@ -505,8 +505,8 @@ static int hugetlb_vmtruncate(struct ino
i_mmap_lock_write(mapping);
if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, offset, LLONG_MAX);
+ i_mmap_unlock_write(mapping);
return 0;
}
@@ -540,8 +540,8 @@ static long hugetlbfs_punch_hole(struct
hugetlb_vmdelete_list(&mapping->i_mmap,
hole_start >> PAGE_SHIFT,
hole_end >> PAGE_SHIFT);
- i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, hole_start, hole_end);
+ i_mmap_unlock_write(mapping);
inode_unlock(inode);
}
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization.patch
The patch titled
Subject: hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
has been removed from the -mm tree. Its filename was
hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
Patch series "hugetlbfs: use i_mmap_rwsem for better synchronization".
These patches are a follow up to the RFC,
http://lkml.kernel.org/r/20181024045053.1467-1-mike.kravetz@oracle.com
Comments made by Naoya were addressed.
There are two primary issues addressed here:
1) For shared pmds, huge PE pointers returned by huge_pte_alloc can
become invalid via a call to huge_pmd_unshare by another thread.
2) hugetlbfs page faults can race with truncation causing invalid
global reserve counts and state.
Both issues are addressed by expanding the use of i_mmap_rwsem.
These issues have existed for a long time. They can be recreated with a
test program that causes page fault/truncation races. For simple
mappings, this results in a negative HugePages_Rsvd count. If racing with
mappings that contain shared pmds, we can hit "BUG at
fs/hugetlbfs/inode.c:444!" or Oops! as the result of an invalid memory
reference.
I broke up the larger RFC into separate patches addressing each issue.
Hopefully, this is easier to understand/review.
This patch (of 3):
While looking at BUGs associated with invalid huge page map counts, it was
discovered and observed that a huge pte pointer could become 'invalid' and
point to another task's page table. Consider the following:
A task takes a page fault on a shared hugetlbfs file and calls
huge_pte_alloc to get a ptep. Suppose the returned ptep points to a
shared pmd.
Now, another task truncates the hugetlbfs file. As part of truncation, it
unmaps everyone who has the file mapped. If the range being truncated is
covered by a shared pmd, huge_pmd_unshare will be called. For all but the
last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
to the pmd. If the task in the middle of the page fault is not the last
user, the ptep returned by huge_pte_alloc now points to another task's
page table or worse. This leads to bad things such as incorrect page
map/reference counts or invalid memory references.
To fix, expand the use of i_mmap_rwsem as follows:
- i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
huge_pmd_share is only called via huge_pte_alloc, so callers of
huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers
of huge_pte_alloc continue to hold the semaphore until finished with the
ptep.
- i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
Link: http://lkml.kernel.org/r/20181203200850.6460-2-mike.kravetz@oracle.com
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 70 ++++++++++++++++++++++++++++++++----------
mm/memory-failure.c | 14 +++++++-
mm/migrate.c | 13 +++++++
mm/rmap.c | 3 +
mm/userfaultfd.c | 11 +++++-
5 files changed, 91 insertions(+), 20 deletions(-)
--- a/mm/hugetlb.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/hugetlb.c
@@ -3240,6 +3240,7 @@ int copy_hugetlb_page_range(struct mm_st
int cow;
struct hstate *h = hstate_vma(vma);
unsigned long sz = huge_page_size(h);
+ struct address_space *mapping = vma->vm_file->f_mapping;
unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
int ret = 0;
@@ -3253,11 +3254,23 @@ int copy_hugetlb_page_range(struct mm_st
for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
spinlock_t *src_ptl, *dst_ptl;
+
src_pte = huge_pte_offset(src, addr, sz);
if (!src_pte)
continue;
+
+ /*
+ * i_mmap_rwsem must be held to call huge_pte_alloc.
+ * Continue to hold until finished with dst_pte, otherwise
+ * it could go away if part of a shared pmd.
+ *
+ * Technically, i_mmap_rwsem is only needed in the non-cow
+ * case as cow mappings are not shared.
+ */
+ i_mmap_lock_read(mapping);
dst_pte = huge_pte_alloc(dst, addr, sz);
if (!dst_pte) {
+ i_mmap_unlock_read(mapping);
ret = -ENOMEM;
break;
}
@@ -3272,8 +3285,10 @@ int copy_hugetlb_page_range(struct mm_st
* after taking the lock below.
*/
dst_entry = huge_ptep_get(dst_pte);
- if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
+ if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+ i_mmap_unlock_read(mapping);
continue;
+ }
dst_ptl = huge_pte_lock(h, dst, dst_pte);
src_ptl = huge_pte_lockptr(h, src, src_pte);
@@ -3322,6 +3337,8 @@ int copy_hugetlb_page_range(struct mm_st
}
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
+
+ i_mmap_unlock_read(mapping);
}
if (cow)
@@ -3773,14 +3790,18 @@ retry:
};
/*
- * hugetlb_fault_mutex must be dropped before
- * handling userfault. Reacquire after handling
- * fault to make calling code simpler.
+ * hugetlb_fault_mutex and i_mmap_rwsem must be
+ * dropped before handling userfault. Reacquire
+ * after handling fault to make calling code simpler.
*/
hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping,
idx, haddr);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
+
ret = handle_userfault(&vmf, VM_UFFD_MISSING);
+
+ i_mmap_lock_read(mapping);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
goto out;
}
@@ -3928,6 +3949,11 @@ vm_fault_t hugetlb_fault(struct mm_struc
ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (ptep) {
+ /*
+ * Since we hold no locks, ptep could be stale. That is
+ * OK as we are only making decisions based on content and
+ * not actually modifying content here.
+ */
entry = huge_ptep_get(ptep);
if (unlikely(is_hugetlb_entry_migration(entry))) {
migration_entry_wait_huge(vma, mm, ptep);
@@ -3935,20 +3961,31 @@ vm_fault_t hugetlb_fault(struct mm_struc
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
return VM_FAULT_HWPOISON_LARGE |
VM_FAULT_SET_HINDEX(hstate_index(h));
- } else {
- ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
- if (!ptep)
- return VM_FAULT_OOM;
}
+ /*
+ * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
+ * until finished with ptep. This prevents huge_pmd_unshare from
+ * being called elsewhere and making the ptep no longer valid.
+ *
+ * ptep could have already be assigned via huge_pte_offset. That
+ * is OK, as huge_pte_alloc will return the same value unless
+ * something changed.
+ */
mapping = vma->vm_file->f_mapping;
- idx = vma_hugecache_offset(h, vma, haddr);
+ i_mmap_lock_read(mapping);
+ ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
+ if (!ptep) {
+ i_mmap_unlock_read(mapping);
+ return VM_FAULT_OOM;
+ }
/*
* Serialize hugepage allocation and instantiation, so that we don't
* get spurious allocation failures if two CPUs race to instantiate
* the same page in the page cache.
*/
+ idx = vma_hugecache_offset(h, vma, haddr);
hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, idx, haddr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -4036,6 +4073,7 @@ out_ptl:
}
out_mutex:
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
/*
* Generally it's safe to hold refcount during waiting page lock. But
* here we just wait to defer the next page fault to avoid busy loop and
@@ -4640,10 +4678,12 @@ void adjust_range_if_pmd_sharing_possibl
* Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc()
* and returns the corresponding pte. While this is not necessary for the
* !shared pmd case because we can allocate the pmd later as well, it makes the
- * code much cleaner. pmd allocation is essential for the shared case because
- * pud has to be populated inside the same i_mmap_rwsem section - otherwise
- * racing tasks could either miss the sharing (see huge_pte_offset) or select a
- * bad pmd for sharing.
+ * code much cleaner.
+ *
+ * This routine must be called with i_mmap_rwsem held in at least read mode.
+ * For hugetlbfs, this prevents removal of any page table entries associated
+ * with the address space. This is important as we are setting up sharing
+ * based on existing page table entries (mappings).
*/
pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
{
@@ -4660,7 +4700,6 @@ pte_t *huge_pmd_share(struct mm_struct *
if (!vma_shareable(vma, addr))
return (pte_t *)pmd_alloc(mm, pud, addr);
- i_mmap_lock_write(mapping);
vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -4690,7 +4729,6 @@ pte_t *huge_pmd_share(struct mm_struct *
spin_unlock(ptl);
out:
pte = (pte_t *)pmd_alloc(mm, pud, addr);
- i_mmap_unlock_write(mapping);
return pte;
}
@@ -4701,7 +4739,7 @@ out:
* indicated by page_count > 1, unmap is achieved by clearing pud and
* decrementing the ref count. If count == 1, the pte page is not shared.
*
- * called with page table lock held.
+ * Called with page table lock held and i_mmap_rwsem held in write mode.
*
* returns: 1 successfully unmapped a shared pte page
* 0 the underlying pte page is not shared, or it is the last user
--- a/mm/memory-failure.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/memory-failure.c
@@ -1028,7 +1028,19 @@ static bool hwpoison_user_mappings(struc
if (kill)
collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
- unmap_success = try_to_unmap(hpage, ttu);
+ if (!PageHuge(hpage)) {
+ unmap_success = try_to_unmap(hpage, ttu);
+ } else {
+ /*
+ * For hugetlb pages, try_to_unmap could potentially call
+ * huge_pmd_unshare. Because of this, take semaphore in
+ * write mode here and set TTU_RMAP_LOCKED to indicate we
+ * have taken the lock at this higer level.
+ */
+ i_mmap_lock_write(mapping);
+ unmap_success = try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED);
+ i_mmap_unlock_write(mapping);
+ }
if (!unmap_success)
pr_err("Memory failure: %#lx: failed to unmap page (mapcount=%d)\n",
pfn, page_mapcount(hpage));
--- a/mm/migrate.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/migrate.c
@@ -1297,8 +1297,19 @@ static int unmap_and_move_huge_page(new_
goto put_anon;
if (page_mapped(hpage)) {
+ struct address_space *mapping = page_mapping(hpage);
+
+ /*
+ * try_to_unmap could potentially call huge_pmd_unshare.
+ * Because of this, take semaphore in write mode here and
+ * set TTU_RMAP_LOCKED to let lower levels know we have
+ * taken the lock.
+ */
+ i_mmap_lock_write(mapping);
try_to_unmap(hpage,
- TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+ TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
+ TTU_RMAP_LOCKED);
+ i_mmap_unlock_write(mapping);
page_was_mapped = 1;
}
--- a/mm/rmap.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/rmap.c
@@ -1374,6 +1374,9 @@ static bool try_to_unmap_one(struct page
/*
* If sharing is possible, start and end will be adjusted
* accordingly.
+ *
+ * If called for a huge page, caller must hold i_mmap_rwsem
+ * in write mode as it is possible to call huge_pmd_unshare.
*/
adjust_range_if_pmd_sharing_possible(vma, &start, &end);
}
--- a/mm/userfaultfd.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization
+++ a/mm/userfaultfd.c
@@ -267,10 +267,14 @@ retry:
VM_BUG_ON(dst_addr & ~huge_page_mask(h));
/*
- * Serialize via hugetlb_fault_mutex
+ * Serialize via i_mmap_rwsem and hugetlb_fault_mutex.
+ * i_mmap_rwsem ensures the dst_pte remains valid even
+ * in the case of shared pmds. fault mutex prevents
+ * races with other faulting threads.
*/
- idx = linear_page_index(dst_vma, dst_addr);
mapping = dst_vma->vm_file->f_mapping;
+ i_mmap_lock_read(mapping);
+ idx = linear_page_index(dst_vma, dst_addr);
hash = hugetlb_fault_mutex_hash(h, dst_mm, dst_vma, mapping,
idx, dst_addr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -279,6 +283,7 @@ retry:
dst_pte = huge_pte_alloc(dst_mm, dst_addr, huge_page_size(h));
if (!dst_pte) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
goto out_unlock;
}
@@ -286,6 +291,7 @@ retry:
dst_pteval = huge_ptep_get(dst_pte);
if (!huge_pte_none(dst_pteval)) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
goto out_unlock;
}
@@ -293,6 +299,7 @@ retry:
dst_addr, src_addr, &page);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
vm_alloc_shared = vm_shared;
cond_resched();
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch
hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization.patch
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
In the case that a bprintk event has a dereferenced pointer that is
stored as a string, and there's more values to process (more args), the
arg was not updated to point to the next arg after processing the
dereferenced pointer, and it screwed up what was to be displayed.
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: linux-trace-devel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Fixes: 37db96bb49629 ("tools lib traceevent: Handle new pointer processing of bprint strings")
Link: http://lkml.kernel.org/r/20181210134522.3f71e2ca@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
---
tools/lib/traceevent/event-parse.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index a5ed291b8a9f..69a96e39f0ab 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -4973,6 +4973,7 @@ static void pretty_print(struct trace_seq *s, void *data, int size, struct tep_e
if (arg->type == TEP_PRINT_BSTRING) {
trace_seq_puts(s, arg->string.string);
+ arg = arg->next;
break;
}
--
2.19.2
From: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
commit 28a9a9e83ceae2cee25b9af9ad20d53aaa9ab951 upstream
Packet queue state is over used to determine SDMA descriptor
availablitity and packet queue request state.
cpu 0 ret = user_sdma_send_pkts(req, pcount);
cpu 0 if (atomic_read(&pq->n_reqs))
cpu 1 IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE)
cpu 0 xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
At this point pq->n_reqs == 0 and pq->state is incorrectly
SDMA_PKT_Q_ACTIVE. The close path will hang waiting for the state
to return to _INACTIVE.
This can also change the state from _DEFERRED to _ACTIVE. However,
this is a mostly benign race.
Remove the racy code path.
Use n_reqs to determine if a packet queue is active or not.
Cc: <stable(a)vger.kernel.org> # 4.19.x
Reviewed-by: Mitko Haralanov <mitko.haralanov(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
---
drivers/infiniband/hw/hfi1/user_sdma.c | 24 ++++++++++--------------
drivers/infiniband/hw/hfi1/user_sdma.h | 9 +++++----
2 files changed, 15 insertions(+), 18 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index 39134dd..51831bf 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -187,7 +187,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
pq->ctxt = uctxt->ctxt;
pq->subctxt = fd->subctxt;
pq->n_max_reqs = hfi1_sdma_comp_ring_size;
- pq->state = SDMA_PKT_Q_INACTIVE;
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
@@ -276,7 +275,7 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
/* Wait until all requests have been freed. */
wait_event_interruptible(
pq->wait,
- (READ_ONCE(pq->state) == SDMA_PKT_Q_INACTIVE));
+ !atomic_read(&pq->n_reqs));
kfree(pq->reqs);
kfree(pq->req_in_use);
kmem_cache_destroy(pq->txreq_cache);
@@ -312,6 +311,13 @@ static u8 dlid_to_selector(u16 dlid)
return mapping[hash];
}
+/**
+ * hfi1_user_sdma_process_request() - Process and start a user sdma request
+ * @fd: valid file descriptor
+ * @iovec: array of io vectors to process
+ * @dim: overall iovec array size
+ * @count: number of io vector array entries processed
+ */
int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
struct iovec *iovec, unsigned long dim,
unsigned long *count)
@@ -560,21 +566,13 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
req->ahg_idx = sdma_ahg_alloc(req->sde);
set_comp_state(pq, cq, info.comp_idx, QUEUED, 0);
+ pq->state = SDMA_PKT_Q_ACTIVE;
/* Send the first N packets in the request to buy us some time */
ret = user_sdma_send_pkts(req, pcount);
if (unlikely(ret < 0 && ret != -EBUSY))
goto free_req;
/*
- * It is possible that the SDMA engine would have processed all the
- * submitted packets by the time we get here. Therefore, only set
- * packet queue state to ACTIVE if there are still uncompleted
- * requests.
- */
- if (atomic_read(&pq->n_reqs))
- xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
-
- /*
* This is a somewhat blocking send implementation.
* The driver will block the caller until all packets of the
* request have been submitted to the SDMA engine. However, it
@@ -1409,10 +1407,8 @@ static void user_sdma_txreq_cb(struct sdma_txreq *txreq, int status)
static inline void pq_update(struct hfi1_user_sdma_pkt_q *pq)
{
- if (atomic_dec_and_test(&pq->n_reqs)) {
- xchg(&pq->state, SDMA_PKT_Q_INACTIVE);
+ if (atomic_dec_and_test(&pq->n_reqs))
wake_up(&pq->wait);
- }
}
static void user_sdma_free_request(struct user_sdma_request *req, bool unpin)
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 0ae0645..91c343f 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -105,9 +105,10 @@ static inline int ahg_header_set(u32 *arr, int idx, size_t array_size,
#define TXREQ_FLAGS_REQ_ACK BIT(0) /* Set the ACK bit in the header */
#define TXREQ_FLAGS_REQ_DISABLE_SH BIT(1) /* Disable header suppression */
-#define SDMA_PKT_Q_INACTIVE BIT(0)
-#define SDMA_PKT_Q_ACTIVE BIT(1)
-#define SDMA_PKT_Q_DEFERRED BIT(2)
+enum pkt_q_sdma_state {
+ SDMA_PKT_Q_ACTIVE,
+ SDMA_PKT_Q_DEFERRED,
+};
/*
* Maximum retry attempts to submit a TX request
@@ -133,7 +134,7 @@ struct hfi1_user_sdma_pkt_q {
struct user_sdma_request *reqs;
unsigned long *req_in_use;
struct iowait busy;
- unsigned state;
+ enum pkt_q_sdma_state state;
wait_queue_head_t wait;
unsigned long unpinned;
struct mmu_rb_handler *handler;
From: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
commit 28a9a9e83ceae2cee25b9af9ad20d53aaa9ab951 upstream
Packet queue state is over used to determine SDMA descriptor
availablitity and packet queue request state.
cpu 0 ret = user_sdma_send_pkts(req, pcount);
cpu 0 if (atomic_read(&pq->n_reqs))
cpu 1 IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE)
cpu 0 xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
At this point pq->n_reqs == 0 and pq->state is incorrectly
SDMA_PKT_Q_ACTIVE. The close path will hang waiting for the state
to return to _INACTIVE.
This can also change the state from _DEFERRED to _ACTIVE. However,
this is a mostly benign race.
Remove the racy code path.
Use n_reqs to determine if a packet queue is active or not.
Cc: <stable(a)vger.kernel.org> # 4.9.0
Reviewed-by: Mitko Haralanov <mitko.haralanov(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
---
drivers/infiniband/hw/hfi1/user_sdma.c | 28 ++++++++++------------------
drivers/infiniband/hw/hfi1/user_sdma.h | 7 ++++++-
2 files changed, 16 insertions(+), 19 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index 619475c..4c11116 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -151,10 +151,6 @@
#define SDMA_REQ_HAVE_AHG 1
#define SDMA_REQ_HAS_ERROR 2
-#define SDMA_PKT_Q_INACTIVE BIT(0)
-#define SDMA_PKT_Q_ACTIVE BIT(1)
-#define SDMA_PKT_Q_DEFERRED BIT(2)
-
/*
* Maximum retry attempts to submit a TX request
* before putting the process to sleep.
@@ -408,7 +404,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt, struct file *fp)
pq->ctxt = uctxt->ctxt;
pq->subctxt = fd->subctxt;
pq->n_max_reqs = hfi1_sdma_comp_ring_size;
- pq->state = SDMA_PKT_Q_INACTIVE;
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
@@ -491,7 +486,7 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd)
/* Wait until all requests have been freed. */
wait_event_interruptible(
pq->wait,
- (ACCESS_ONCE(pq->state) == SDMA_PKT_Q_INACTIVE));
+ !atomic_read(&pq->n_reqs));
kfree(pq->reqs);
kfree(pq->req_in_use);
kmem_cache_destroy(pq->txreq_cache);
@@ -527,6 +522,13 @@ static u8 dlid_to_selector(u16 dlid)
return mapping[hash];
}
+/**
+ * hfi1_user_sdma_process_request() - Process and start a user sdma request
+ * @fp: valid file pointer
+ * @iovec: array of io vectors to process
+ * @dim: overall iovec array size
+ * @count: number of io vector array entries processed
+ */
int hfi1_user_sdma_process_request(struct file *fp, struct iovec *iovec,
unsigned long dim, unsigned long *count)
{
@@ -768,21 +770,13 @@ int hfi1_user_sdma_process_request(struct file *fp, struct iovec *iovec,
}
set_comp_state(pq, cq, info.comp_idx, QUEUED, 0);
+ pq->state = SDMA_PKT_Q_ACTIVE;
/* Send the first N packets in the request to buy us some time */
ret = user_sdma_send_pkts(req, pcount);
if (unlikely(ret < 0 && ret != -EBUSY))
goto free_req;
/*
- * It is possible that the SDMA engine would have processed all the
- * submitted packets by the time we get here. Therefore, only set
- * packet queue state to ACTIVE if there are still uncompleted
- * requests.
- */
- if (atomic_read(&pq->n_reqs))
- xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
-
- /*
* This is a somewhat blocking send implementation.
* The driver will block the caller until all packets of the
* request have been submitted to the SDMA engine. However, it
@@ -1526,10 +1520,8 @@ static void user_sdma_txreq_cb(struct sdma_txreq *txreq, int status)
static inline void pq_update(struct hfi1_user_sdma_pkt_q *pq)
{
- if (atomic_dec_and_test(&pq->n_reqs)) {
- xchg(&pq->state, SDMA_PKT_Q_INACTIVE);
+ if (atomic_dec_and_test(&pq->n_reqs))
wake_up(&pq->wait);
- }
}
static void user_sdma_free_request(struct user_sdma_request *req, bool unpin)
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 3900171..09dd843 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -53,6 +53,11 @@
extern uint extended_psn;
+enum pkt_q_sdma_state {
+ SDMA_PKT_Q_ACTIVE,
+ SDMA_PKT_Q_DEFERRED,
+};
+
struct hfi1_user_sdma_pkt_q {
struct list_head list;
unsigned ctxt;
@@ -65,7 +70,7 @@ struct hfi1_user_sdma_pkt_q {
struct user_sdma_request *reqs;
unsigned long *req_in_use;
struct iowait busy;
- unsigned state;
+ enum pkt_q_sdma_state state;
wait_queue_head_t wait;
unsigned long unpinned;
struct mmu_rb_handler *handler;
From: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
commit 28a9a9e83ceae2cee25b9af9ad20d53aaa9ab951 upstream
Packet queue state is over used to determine SDMA descriptor
availablitity and packet queue request state.
cpu 0 ret = user_sdma_send_pkts(req, pcount);
cpu 0 if (atomic_read(&pq->n_reqs))
cpu 1 IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE)
cpu 0 xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
At this point pq->n_reqs == 0 and pq->state is incorrectly
SDMA_PKT_Q_ACTIVE. The close path will hang waiting for the state
to return to _INACTIVE.
This can also change the state from _DEFERRED to _ACTIVE. However,
this is a mostly benign race.
Remove the racy code path.
Use n_reqs to determine if a packet queue is active or not.
Cc: <stable(a)vger.kernel.org> # 4.14.0>
Reviewed-by: Mitko Haralanov <mitko.haralanov(a)intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
---
drivers/infiniband/hw/hfi1/user_sdma.c | 24 ++++++++++--------------
drivers/infiniband/hw/hfi1/user_sdma.h | 9 +++++----
2 files changed, 15 insertions(+), 18 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index c14ec04..cbe5ab2 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -187,7 +187,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
pq->ctxt = uctxt->ctxt;
pq->subctxt = fd->subctxt;
pq->n_max_reqs = hfi1_sdma_comp_ring_size;
- pq->state = SDMA_PKT_Q_INACTIVE;
atomic_set(&pq->n_reqs, 0);
init_waitqueue_head(&pq->wait);
atomic_set(&pq->n_locked, 0);
@@ -276,7 +275,7 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
/* Wait until all requests have been freed. */
wait_event_interruptible(
pq->wait,
- (ACCESS_ONCE(pq->state) == SDMA_PKT_Q_INACTIVE));
+ !atomic_read(&pq->n_reqs));
kfree(pq->reqs);
kfree(pq->req_in_use);
kmem_cache_destroy(pq->txreq_cache);
@@ -312,6 +311,13 @@ static u8 dlid_to_selector(u16 dlid)
return mapping[hash];
}
+/**
+ * hfi1_user_sdma_process_request() - Process and start a user sdma request
+ * @fd: valid file descriptor
+ * @iovec: array of io vectors to process
+ * @dim: overall iovec array size
+ * @count: number of io vector array entries processed
+ */
int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
struct iovec *iovec, unsigned long dim,
unsigned long *count)
@@ -560,21 +566,13 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
req->ahg_idx = sdma_ahg_alloc(req->sde);
set_comp_state(pq, cq, info.comp_idx, QUEUED, 0);
+ pq->state = SDMA_PKT_Q_ACTIVE;
/* Send the first N packets in the request to buy us some time */
ret = user_sdma_send_pkts(req, pcount);
if (unlikely(ret < 0 && ret != -EBUSY))
goto free_req;
/*
- * It is possible that the SDMA engine would have processed all the
- * submitted packets by the time we get here. Therefore, only set
- * packet queue state to ACTIVE if there are still uncompleted
- * requests.
- */
- if (atomic_read(&pq->n_reqs))
- xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
-
- /*
* This is a somewhat blocking send implementation.
* The driver will block the caller until all packets of the
* request have been submitted to the SDMA engine. However, it
@@ -1391,10 +1389,8 @@ static void user_sdma_txreq_cb(struct sdma_txreq *txreq, int status)
static inline void pq_update(struct hfi1_user_sdma_pkt_q *pq)
{
- if (atomic_dec_and_test(&pq->n_reqs)) {
- xchg(&pq->state, SDMA_PKT_Q_INACTIVE);
+ if (atomic_dec_and_test(&pq->n_reqs))
wake_up(&pq->wait);
- }
}
static void user_sdma_free_request(struct user_sdma_request *req, bool unpin)
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h
index 5af5233..2b5326d 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.h
+++ b/drivers/infiniband/hw/hfi1/user_sdma.h
@@ -94,9 +94,10 @@
#define TXREQ_FLAGS_REQ_ACK BIT(0) /* Set the ACK bit in the header */
#define TXREQ_FLAGS_REQ_DISABLE_SH BIT(1) /* Disable header suppression */
-#define SDMA_PKT_Q_INACTIVE BIT(0)
-#define SDMA_PKT_Q_ACTIVE BIT(1)
-#define SDMA_PKT_Q_DEFERRED BIT(2)
+enum pkt_q_sdma_state {
+ SDMA_PKT_Q_ACTIVE,
+ SDMA_PKT_Q_DEFERRED,
+};
/*
* Maximum retry attempts to submit a TX request
@@ -124,7 +125,7 @@ struct hfi1_user_sdma_pkt_q {
struct user_sdma_request *reqs;
unsigned long *req_in_use;
struct iowait busy;
- unsigned state;
+ enum pkt_q_sdma_state state;
wait_queue_head_t wait;
unsigned long unpinned;
struct mmu_rb_handler *handler;
Hi Greg,
This was not marked for stable but seems it should be in stable. And the
second patch fixes the first one.
Please apply to your queue of 4.14-stable.
--
Regards
Sudip
Hello,
Please picked up this patch for linux 4.4 and 4.9.
This fixes CVE-2017-0605 (Rejected?). Tested in Debian ;)
Thank.
[ Upstream commit e09e28671cda63e6308b31798b997639120e2a21 ]
From: Amey Telawane <ameyt(a)codeaurora.org>
Date: Wed, 3 May 2017 15:41:14 +0530
Subject: [PATCH] tracing: Use strlcpy() instead of strcpy() in
__trace_find_cmdline()
Strcpy is inherently not safe, and strlcpy() should be used instead.
__trace_find_cmdline() uses strcpy() because the comms saved must have a
terminating nul character, but it doesn't hurt to add the extra protection
of using strlcpy() instead of strcpy().
Link: http://lkml.kernel.org/r/1493806274-13936-1-git-send-email-amit.pundir@lina…
Signed-off-by: Amey Telawane <ameyt(a)codeaurora.org>
[AmitP: Cherry-picked this commit from CodeAurora kernel/msm-3.10
https://source.codeaurora.org/quic/la/kernel/msm-3.10/commit/?id=2161ae9a70…]
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
[ Updated change log and removed the "- 1" from len parameter ]
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1862,7 +1862,7 @@ static void __trace_find_cmdline(int pid
map = savedcmd->map_pid_to_cmdline[pid];
if (map != NO_CMDLINE_MAP)
- strcpy(comm, get_saved_cmdlines(map));
+ strlcpy(comm, get_saved_cmdlines(map), TASK_COMM_LEN);
else
strcpy(comm, "<...>");
}
Since commit d26c25a9d19b ("arm64: KVM: Tighten guest core register
access from userspace"), KVM_{GET,SET}_ONE_REG rejects register IDs
that do not correspond to a single underlying architectural register.
KVM_GET_REG_LIST was not changed to match however: instead, it
simply yields a list of 32-bit register IDs that together cover the
whole kvm_regs struct. This means that if userspace tries to use
the resulting list of IDs directly to drive calls to KVM_*_ONE_REG,
some of those calls will now fail.
This was not the intention. Instead, iterating KVM_*_ONE_REG over
the list of IDs returned by KVM_GET_REG_LIST should be guaranteed
to work.
This patch fixes the problem by splitting validate_core_reg_id()
into a backend core_reg_size_from_offset() which does all of the
work except for checking that the size field in the register ID
matches, and kvm_arm_copy_reg_indices() and num_core_regs() are
converted to use this to enumerate the valid offsets.
kvm_arm_copy_reg_indices() now also sets the register ID size field
appropriately based on the value returned, so the register ID
supplied to userspace is fully qualified for use with the register
access ioctls.
Cc: stable(a)vger.kernel.org
Fixes: d26c25a9d19b ("arm64: KVM: Tighten guest core register access from userspace")
Signed-off-by: Dave Martin <Dave.Martin(a)arm.com>
---
arch/arm64/kvm/guest.c | 61 ++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 54 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index dd436a5..cbe423b 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -57,9 +57,8 @@ static u64 core_reg_offset_from_id(u64 id)
return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
}
-static int validate_core_offset(const struct kvm_one_reg *reg)
+static int core_reg_size_from_offset(u64 off)
{
- u64 off = core_reg_offset_from_id(reg->id);
int size;
switch (off) {
@@ -89,8 +88,21 @@ static int validate_core_offset(const struct kvm_one_reg *reg)
return -EINVAL;
}
- if (KVM_REG_SIZE(reg->id) == size &&
- IS_ALIGNED(off, size / sizeof(__u32)))
+ if (IS_ALIGNED(off, size / sizeof(__u32)))
+ return size;
+
+ return -EINVAL;
+}
+
+static int validate_core_offset(const struct kvm_one_reg *reg)
+{
+ u64 off = core_reg_offset_from_id(reg->id);
+ int size = core_reg_size_from_offset(off);
+
+ if (size < 0)
+ return -EINVAL;
+
+ if (KVM_REG_SIZE(reg->id) == size)
return 0;
return -EINVAL;
@@ -195,7 +207,19 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
static unsigned long num_core_regs(void)
{
- return sizeof(struct kvm_regs) / sizeof(__u32);
+ unsigned int i;
+ int n = 0;
+
+ for (i = 0; i < sizeof(struct kvm_regs) / sizeof(__u32); i++) {
+ int size = core_reg_size_from_offset(i);
+
+ if (size < 0)
+ continue;
+
+ n++;
+ }
+
+ return n;
}
/**
@@ -270,11 +294,34 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
{
unsigned int i;
- const u64 core_reg = KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE;
int ret;
for (i = 0; i < sizeof(struct kvm_regs) / sizeof(__u32); i++) {
- if (put_user(core_reg | i, uindices))
+ u64 reg = KVM_REG_ARM64 | KVM_REG_ARM_CORE | i;
+ int size = core_reg_size_from_offset(i);
+
+ if (size < 0)
+ continue;
+
+ switch (size) {
+ case sizeof(__u32):
+ reg |= KVM_REG_SIZE_U32;
+ break;
+
+ case sizeof(__u64):
+ reg |= KVM_REG_SIZE_U64;
+ break;
+
+ case sizeof(__uint128_t):
+ reg |= KVM_REG_SIZE_U128;
+ break;
+
+ default:
+ WARN_ON(1);
+ continue;
+ }
+
+ if (put_user(reg, uindices))
return -EFAULT;
uindices++;
}
--
2.1.4
Hi Greg,
I did some randconfig testing on linux-4.19 arm/arm64/x86. So far I needed
27 patches, most of which are also still needed in mainline Linux. I
had submitted some before, and others were not submitted previously
for some reason. I'll try to get those fixed in mainline and then
make sure we get them into 4.19 as well.
This series for now contains four patches that did make it into mainline:
2e6ae11dd0d1 ("slimbus: ngd: mark PM functions as __maybe_unused")
33f49571d750 ("staging: olpc_dcon: add a missing dependency")
0eeec01488da ("scsi: raid_attrs: fix unused variable warning")
11d4afd4ff66 ("sched/pelt: Fix warning and clean up IRQ PELT config")
Feel free to either cherry-pick those from mainline or apply the
patch from this series, whichever works best for you.
The other three patches are for warnings in code that got removed in
mainline kernels:
3e9efc3299dd ("i2c: aspeed: Handle master/slave combined irq events properly")
972910948fb6 ("ARM: dts: qcom: Remove Arrow SD600 eval board")
effec874792f ("drm/msm/dpu: Remove dpu_dbg")
My feeling was that it's safer to just address the warning by fixing
the code correctly in each of these cases, but if you disagree,
applying the mainline change should work equally well, so decide
for yourself.
Arnd
Arnd Bergmann (5):
scsi: raid_attrs: fix unused variable warning
slimbus: ngd: mark PM functions as __maybe_unused
[stable-4.19] i2c: aspeed: fix build warning
[stable-4.19] ARM: dts: qcom-apq8064-arrow-sd-600eval fix
graph_endpoint warning
[stable-4.19] drm/msm: fix address space warning
Lubomir Rintel (1):
staging: olpc_dcon: add a missing dependency
Vincent Guittot (1):
sched/pelt: Fix warning and clean up IRQ PELT config
arch/arm/boot/dts/qcom-apq8064-arrow-sd-600eval.dts | 5 +++++
drivers/gpu/drm/msm/disp/dpu1/dpu_dbg.c | 8 ++++----
drivers/i2c/busses/i2c-aspeed.c | 4 +++-
drivers/scsi/raid_class.c | 4 +---
drivers/slimbus/qcom-ngd-ctrl.c | 6 ++----
drivers/staging/olpc_dcon/Kconfig | 1 +
init/Kconfig | 5 +++++
kernel/sched/core.c | 7 +++----
kernel/sched/fair.c | 2 +-
kernel/sched/pelt.c | 2 +-
kernel/sched/pelt.h | 2 +-
kernel/sched/sched.h | 5 ++---
12 files changed, 29 insertions(+), 22 deletions(-)
Cc: Andrew Jeffery <andrew(a)aj.id.au>
Cc: Andy Gross <andy.gross(a)linaro.org>
Cc: bp(a)alien8.de
Cc: Daniel Drake <dsd(a)laptop.org>
Cc: David Brown <david.brown(a)linaro.org>
Cc: dou_liyang(a)163.com
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "James E.J. Bottomley" <jejb(a)linux.vnet.ibm.com>
Cc: Jens Frederich <jfrederich(a)gmail.com>
Cc: Lubomir Rintel <lkundrak(a)v3.sk>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: "Martin K. Petersen" <martin.petersen(a)oracle.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rob Clark <robdclark(a)gmail.com>
Cc: Rob Herring <robh+dt(a)kernel.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: linux-arm-msm(a)vger.kernel.org
Cc: devicetree(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: freedreno(a)lists.freedesktop.org
Cc: linux-i2c(a)vger.kernel.org
Cc: openbmc(a)lists.ozlabs.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-aspeed(a)lists.ozlabs.org
Cc: linux-scsi(a)vger.kernel.org
--
2.20.0
Mediatek Preloader is a proprietary embedded boot loader for loading
Little Kernel and Linux into device DRAM.
This boot loader also handle firmware update. Mediatek Preloader will be
enumerated as a virtual COM port when the device is connected to Windows
or Linux OS via CDC-ACM class driver. When the USB enumeration has been
done, Mediatek Preloader will send out handshake command "READY" to PC
actively instead of waiting command from the download tool.
Since Linux 4.12, the commit "tty: reset termios state on device
registration" (93857edd9829e144acb6c7e72d593f6e01aead66) causes Mediatek
Preloader receiving some abnoraml command like "READYXX" as it sent.
This will be recognized as an incorrect response. The behavior change
also causes the download handshake fail. This change only affects
subsequent connects if the reconnected device happens to get the same minor
number.
By disabling the ECHO termios flag could avoid this problem. However, it
cannot be done by user space configuration when download tool open
/dev/ttyACM0. This is because the device running Mediatek Preloader will
send handshake command "READY" immediately once the CDC-ACM driver is
ready.
This patch wants to fix above problem by introducing "DISABLE_ECHO"
property in driver_info. When Mediatek Preloader is connected, the
CDC-ACM driver could disable ECHO flag in termios to avoid the problem.
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
Cc: stable(a)vger.kernel.org
---
Changes for v2:
- Move quirks testing of DISABLE_ECHO flag into acm_tty_install().
- Change quirks testing into bitwise comparison.
Changes for v3:
- Replace quirks testing from init_termios to tty->termios.
- Remove parenthesis for ECHO flag.
Changes for v4:
- Drop quirks varible to simplify the patch.
- Move termios operation right after the driver_data has been installed.
- Write general style comment for suppressing initial echoing.
Changes for v5:
- Fix: termios operation right abover the driver_data has been installed.
- Update commit comment about this patch affects the reconnected device
which get the same minor numbers.
Changes for v6:
- Update VID/PID:0x0e8d/0x0003 as Mediatek Inc BROM.
- Update VID/PID:0x0e8d/0x2000 as Mediatek Inc Preloader.
drivers/usb/class/cdc-acm.c | 14 ++++++++++++--
drivers/usb/class/cdc-acm.h | 1 +
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 1b68fed..161e2d4 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -581,6 +581,13 @@ static int acm_tty_install(struct tty_driver *driver, struct tty_struct *tty)
if (retval)
goto error_init_termios;
+ /*
+ * Suppress initial echoing for some devices which might send data
+ * immediately after acm driver has been installed.
+ */
+ if (acm->quirks & DISABLE_ECHO)
+ tty->termios.c_lflag &= ~ECHO;
+
tty->driver_data = acm;
return 0;
@@ -1654,8 +1661,11 @@ static int acm_pre_reset(struct usb_interface *intf)
{ USB_DEVICE(0x0870, 0x0001), /* Metricom GS Modem */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
- { USB_DEVICE(0x0e8d, 0x0003), /* FIREFLY, MediaTek Inc; andrey.arapov(a)gmail.com */
- .driver_info = NO_UNION_NORMAL, /* has no union descriptor */
+ { USB_DEVICE(0x0e8d, 0x0003), /* MediaTek Inc BROM */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
+ },
+ { USB_DEVICE(0x0e8d, 0x2000), /* MediaTek Inc Preloader */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
},
{ USB_DEVICE(0x0e8d, 0x3329), /* MediaTek Inc GPS */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index ca06b20..515aad0 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -140,3 +140,4 @@ struct acm {
#define QUIRK_CONTROL_LINE_STATE BIT(6)
#define CLEAR_HALT_CONDITIONS BIT(7)
#define SEND_ZERO_PACKET BIT(8)
+#define DISABLE_ECHO BIT(9)
--
1.9.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d57f9da890696af1484f4a47f7f123560197865a Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Fri, 30 Nov 2018 15:31:48 +0900
Subject: [PATCH] dm zoned: Fix target BIO completion handling
struct bioctx includes the ref refcount_t to track the number of I/O
fragments used to process a target BIO as well as ensure that the zone
of the BIO is kept in the active state throughout the lifetime of the
BIO. However, since decrementing of this reference count is done in the
target .end_io method, the function bio_endio() must be called multiple
times for read and write target BIOs, which causes problems with the
value of the __bi_remaining struct bio field for chained BIOs (e.g. the
clone BIO passed by dm core is large and splits into fragments by the
block layer), resulting in incorrect values and inconsistencies with the
BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call:
BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);
in bio_remaining_done() called from bio_endio().
Fix this ensuring that bio_endio() is called only once for any target
BIO by always using internal clone BIOs for processing any read or
write target BIO. This allows reference counting using the target BIO
context counter to trigger the target BIO completion bio_endio() call
once all data, metadata and other zone work triggered by the BIO
complete.
Overall, this simplifies the code too as the target .end_io becomes
unnecessary and differences between read and write BIO issuing and
completion processing disappear.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 981154e59461..6af5babe6837 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -20,7 +20,6 @@ struct dmz_bioctx {
struct dm_zone *zone;
struct bio *bio;
refcount_t ref;
- blk_status_t status;
};
/*
@@ -78,65 +77,66 @@ static inline void dmz_bio_endio(struct bio *bio, blk_status_t status)
{
struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
- if (bioctx->status == BLK_STS_OK && status != BLK_STS_OK)
- bioctx->status = status;
- bio_endio(bio);
+ if (status != BLK_STS_OK && bio->bi_status == BLK_STS_OK)
+ bio->bi_status = status;
+
+ if (refcount_dec_and_test(&bioctx->ref)) {
+ struct dm_zone *zone = bioctx->zone;
+
+ if (zone) {
+ if (bio->bi_status != BLK_STS_OK &&
+ bio_op(bio) == REQ_OP_WRITE &&
+ dmz_is_seq(zone))
+ set_bit(DMZ_SEQ_WRITE_ERR, &zone->flags);
+ dmz_deactivate_zone(zone);
+ }
+ bio_endio(bio);
+ }
}
/*
- * Partial clone read BIO completion callback. This terminates the
+ * Completion callback for an internally cloned target BIO. This terminates the
* target BIO when there are no more references to its context.
*/
-static void dmz_read_bio_end_io(struct bio *bio)
+static void dmz_clone_endio(struct bio *clone)
{
- struct dmz_bioctx *bioctx = bio->bi_private;
- blk_status_t status = bio->bi_status;
+ struct dmz_bioctx *bioctx = clone->bi_private;
+ blk_status_t status = clone->bi_status;
- bio_put(bio);
+ bio_put(clone);
dmz_bio_endio(bioctx->bio, status);
}
/*
- * Issue a BIO to a zone. The BIO may only partially process the
+ * Issue a clone of a target BIO. The clone may only partially process the
* original target BIO.
*/
-static int dmz_submit_read_bio(struct dmz_target *dmz, struct dm_zone *zone,
- struct bio *bio, sector_t chunk_block,
- unsigned int nr_blocks)
+static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
+ struct bio *bio, sector_t chunk_block,
+ unsigned int nr_blocks)
{
struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
- sector_t sector;
struct bio *clone;
- /* BIO remap sector */
- sector = dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
-
- /* If the read is not partial, there is no need to clone the BIO */
- if (nr_blocks == dmz_bio_blocks(bio)) {
- /* Setup and submit the BIO */
- bio->bi_iter.bi_sector = sector;
- refcount_inc(&bioctx->ref);
- generic_make_request(bio);
- return 0;
- }
-
- /* Partial BIO: we need to clone the BIO */
clone = bio_clone_fast(bio, GFP_NOIO, &dmz->bio_set);
if (!clone)
return -ENOMEM;
- /* Setup the clone */
- clone->bi_iter.bi_sector = sector;
+ bio_set_dev(clone, dmz->dev->bdev);
+ clone->bi_iter.bi_sector =
+ dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
clone->bi_iter.bi_size = dmz_blk2sect(nr_blocks) << SECTOR_SHIFT;
- clone->bi_end_io = dmz_read_bio_end_io;
+ clone->bi_end_io = dmz_clone_endio;
clone->bi_private = bioctx;
bio_advance(bio, clone->bi_iter.bi_size);
- /* Submit the clone */
refcount_inc(&bioctx->ref);
generic_make_request(clone);
+ if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
+ zone->wp_block += nr_blocks;
+
return 0;
}
@@ -214,7 +214,7 @@ static int dmz_handle_read(struct dmz_target *dmz, struct dm_zone *zone,
if (nr_blocks) {
/* Valid blocks found: read them */
nr_blocks = min_t(unsigned int, nr_blocks, end_block - chunk_block);
- ret = dmz_submit_read_bio(dmz, rzone, bio, chunk_block, nr_blocks);
+ ret = dmz_submit_bio(dmz, rzone, bio, chunk_block, nr_blocks);
if (ret)
return ret;
chunk_block += nr_blocks;
@@ -228,25 +228,6 @@ static int dmz_handle_read(struct dmz_target *dmz, struct dm_zone *zone,
return 0;
}
-/*
- * Issue a write BIO to a zone.
- */
-static void dmz_submit_write_bio(struct dmz_target *dmz, struct dm_zone *zone,
- struct bio *bio, sector_t chunk_block,
- unsigned int nr_blocks)
-{
- struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-
- /* Setup and submit the BIO */
- bio_set_dev(bio, dmz->dev->bdev);
- bio->bi_iter.bi_sector = dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
- refcount_inc(&bioctx->ref);
- generic_make_request(bio);
-
- if (dmz_is_seq(zone))
- zone->wp_block += nr_blocks;
-}
-
/*
* Write blocks directly in a data zone, at the write pointer.
* If a buffer zone is assigned, invalidate the blocks written
@@ -265,7 +246,9 @@ static int dmz_handle_direct_write(struct dmz_target *dmz,
return -EROFS;
/* Submit write */
- dmz_submit_write_bio(dmz, zone, bio, chunk_block, nr_blocks);
+ ret = dmz_submit_bio(dmz, zone, bio, chunk_block, nr_blocks);
+ if (ret)
+ return ret;
/*
* Validate the blocks in the data zone and invalidate
@@ -301,7 +284,9 @@ static int dmz_handle_buffered_write(struct dmz_target *dmz,
return -EROFS;
/* Submit write */
- dmz_submit_write_bio(dmz, bzone, bio, chunk_block, nr_blocks);
+ ret = dmz_submit_bio(dmz, bzone, bio, chunk_block, nr_blocks);
+ if (ret)
+ return ret;
/*
* Validate the blocks in the buffer zone
@@ -600,7 +585,6 @@ static int dmz_map(struct dm_target *ti, struct bio *bio)
bioctx->zone = NULL;
bioctx->bio = bio;
refcount_set(&bioctx->ref, 1);
- bioctx->status = BLK_STS_OK;
/* Set the BIO pending in the flush list */
if (!nr_sectors && bio_op(bio) == REQ_OP_WRITE) {
@@ -623,35 +607,6 @@ static int dmz_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_SUBMITTED;
}
-/*
- * Completed target BIO processing.
- */
-static int dmz_end_io(struct dm_target *ti, struct bio *bio, blk_status_t *error)
-{
- struct dmz_bioctx *bioctx = dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-
- if (bioctx->status == BLK_STS_OK && *error)
- bioctx->status = *error;
-
- if (!refcount_dec_and_test(&bioctx->ref))
- return DM_ENDIO_INCOMPLETE;
-
- /* Done */
- bio->bi_status = bioctx->status;
-
- if (bioctx->zone) {
- struct dm_zone *zone = bioctx->zone;
-
- if (*error && bio_op(bio) == REQ_OP_WRITE) {
- if (dmz_is_seq(zone))
- set_bit(DMZ_SEQ_WRITE_ERR, &zone->flags);
- }
- dmz_deactivate_zone(zone);
- }
-
- return DM_ENDIO_DONE;
-}
-
/*
* Get zoned device information.
*/
@@ -946,7 +901,6 @@ static struct target_type dmz_type = {
.ctr = dmz_ctr,
.dtr = dmz_dtr,
.map = dmz_map,
- .end_io = dmz_end_io,
.io_hints = dmz_io_hints,
.prepare_ioctl = dmz_prepare_ioctl,
.postsuspend = dmz_suspend,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f6c367585d0d851349d3a9e607c43e5bea993fa1 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Tue, 11 Dec 2018 13:31:40 -0500
Subject: [PATCH] dm thin: send event about thin-pool state change _after_
making it
Sending a DM event before a thin-pool state change is about to happen is
a bug. It wasn't realized until it became clear that userspace response
to the event raced with the actual state change that the event was
meant to notify about.
Fix this by first updating internal thin-pool state to reflect what the
DM event is being issued about. This fixes a long-standing racey/buggy
userspace device-mapper-test-suite 'resize_io' test that would get an
event but not find the state it was looking for -- so it would just go
on to hang because no other events caused the test to reevaluate the
thin-pool's state.
Cc: stable(a)vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 0bd8d498b3b9..53f8d03f76f7 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -195,7 +195,7 @@ static void throttle_unlock(struct throttle *t)
struct dm_thin_new_mapping;
/*
- * The pool runs in 4 modes. Ordered in degraded order for comparisons.
+ * The pool runs in various modes. Ordered in degraded order for comparisons.
*/
enum pool_mode {
PM_WRITE, /* metadata may be changed */
@@ -282,9 +282,38 @@ struct pool {
mempool_t mapping_pool;
};
-static enum pool_mode get_pool_mode(struct pool *pool);
static void metadata_operation_failed(struct pool *pool, const char *op, int r);
+static enum pool_mode get_pool_mode(struct pool *pool)
+{
+ return pool->pf.mode;
+}
+
+static void notify_of_pool_mode_change(struct pool *pool)
+{
+ const char *descs[] = {
+ "write",
+ "out-of-data-space",
+ "read-only",
+ "read-only",
+ "fail"
+ };
+ const char *extra_desc = NULL;
+ enum pool_mode mode = get_pool_mode(pool);
+
+ if (mode == PM_OUT_OF_DATA_SPACE) {
+ if (!pool->pf.error_if_no_space)
+ extra_desc = " (queue IO)";
+ else
+ extra_desc = " (error IO)";
+ }
+
+ dm_table_event(pool->ti->table);
+ DMINFO("%s: switching pool to %s%s mode",
+ dm_device_name(pool->pool_md),
+ descs[(int)mode], extra_desc ? : "");
+}
+
/*
* Target context for a pool.
*/
@@ -2351,8 +2380,6 @@ static void do_waker(struct work_struct *ws)
queue_delayed_work(pool->wq, &pool->waker, COMMIT_PERIOD);
}
-static void notify_of_pool_mode_change_to_oods(struct pool *pool);
-
/*
* We're holding onto IO to allow userland time to react. After the
* timeout either the pool will have been resized (and thus back in
@@ -2365,7 +2392,7 @@ static void do_no_space_timeout(struct work_struct *ws)
if (get_pool_mode(pool) == PM_OUT_OF_DATA_SPACE && !pool->pf.error_if_no_space) {
pool->pf.error_if_no_space = true;
- notify_of_pool_mode_change_to_oods(pool);
+ notify_of_pool_mode_change(pool);
error_retry_list_with_code(pool, BLK_STS_NOSPC);
}
}
@@ -2433,26 +2460,6 @@ static void noflush_work(struct thin_c *tc, void (*fn)(struct work_struct *))
/*----------------------------------------------------------------*/
-static enum pool_mode get_pool_mode(struct pool *pool)
-{
- return pool->pf.mode;
-}
-
-static void notify_of_pool_mode_change(struct pool *pool, const char *new_mode)
-{
- dm_table_event(pool->ti->table);
- DMINFO("%s: switching pool to %s mode",
- dm_device_name(pool->pool_md), new_mode);
-}
-
-static void notify_of_pool_mode_change_to_oods(struct pool *pool)
-{
- if (!pool->pf.error_if_no_space)
- notify_of_pool_mode_change(pool, "out-of-data-space (queue IO)");
- else
- notify_of_pool_mode_change(pool, "out-of-data-space (error IO)");
-}
-
static bool passdown_enabled(struct pool_c *pt)
{
return pt->adjusted_pf.discard_passdown;
@@ -2501,8 +2508,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
switch (new_mode) {
case PM_FAIL:
- if (old_mode != new_mode)
- notify_of_pool_mode_change(pool, "failure");
dm_pool_metadata_read_only(pool->pmd);
pool->process_bio = process_bio_fail;
pool->process_discard = process_bio_fail;
@@ -2516,8 +2521,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
case PM_OUT_OF_METADATA_SPACE:
case PM_READ_ONLY:
- if (!is_read_only_pool_mode(old_mode))
- notify_of_pool_mode_change(pool, "read-only");
dm_pool_metadata_read_only(pool->pmd);
pool->process_bio = process_bio_read_only;
pool->process_discard = process_bio_success;
@@ -2538,8 +2541,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
* alarming rate. Adjust your low water mark if you're
* frequently seeing this mode.
*/
- if (old_mode != new_mode)
- notify_of_pool_mode_change_to_oods(pool);
pool->out_of_data_space = true;
pool->process_bio = process_bio_read_only;
pool->process_discard = process_discard_bio;
@@ -2552,8 +2553,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
break;
case PM_WRITE:
- if (old_mode != new_mode)
- notify_of_pool_mode_change(pool, "write");
if (old_mode == PM_OUT_OF_DATA_SPACE)
cancel_delayed_work_sync(&pool->no_space_timeout);
pool->out_of_data_space = false;
@@ -2573,6 +2572,9 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
* doesn't cause an unexpected mode transition on resume.
*/
pt->adjusted_pf.mode = new_mode;
+
+ if (old_mode != new_mode)
+ notify_of_pool_mode_change(pool);
}
static void abort_transaction(struct pool *pool)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f6c367585d0d851349d3a9e607c43e5bea993fa1 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Tue, 11 Dec 2018 13:31:40 -0500
Subject: [PATCH] dm thin: send event about thin-pool state change _after_
making it
Sending a DM event before a thin-pool state change is about to happen is
a bug. It wasn't realized until it became clear that userspace response
to the event raced with the actual state change that the event was
meant to notify about.
Fix this by first updating internal thin-pool state to reflect what the
DM event is being issued about. This fixes a long-standing racey/buggy
userspace device-mapper-test-suite 'resize_io' test that would get an
event but not find the state it was looking for -- so it would just go
on to hang because no other events caused the test to reevaluate the
thin-pool's state.
Cc: stable(a)vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 0bd8d498b3b9..53f8d03f76f7 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -195,7 +195,7 @@ static void throttle_unlock(struct throttle *t)
struct dm_thin_new_mapping;
/*
- * The pool runs in 4 modes. Ordered in degraded order for comparisons.
+ * The pool runs in various modes. Ordered in degraded order for comparisons.
*/
enum pool_mode {
PM_WRITE, /* metadata may be changed */
@@ -282,9 +282,38 @@ struct pool {
mempool_t mapping_pool;
};
-static enum pool_mode get_pool_mode(struct pool *pool);
static void metadata_operation_failed(struct pool *pool, const char *op, int r);
+static enum pool_mode get_pool_mode(struct pool *pool)
+{
+ return pool->pf.mode;
+}
+
+static void notify_of_pool_mode_change(struct pool *pool)
+{
+ const char *descs[] = {
+ "write",
+ "out-of-data-space",
+ "read-only",
+ "read-only",
+ "fail"
+ };
+ const char *extra_desc = NULL;
+ enum pool_mode mode = get_pool_mode(pool);
+
+ if (mode == PM_OUT_OF_DATA_SPACE) {
+ if (!pool->pf.error_if_no_space)
+ extra_desc = " (queue IO)";
+ else
+ extra_desc = " (error IO)";
+ }
+
+ dm_table_event(pool->ti->table);
+ DMINFO("%s: switching pool to %s%s mode",
+ dm_device_name(pool->pool_md),
+ descs[(int)mode], extra_desc ? : "");
+}
+
/*
* Target context for a pool.
*/
@@ -2351,8 +2380,6 @@ static void do_waker(struct work_struct *ws)
queue_delayed_work(pool->wq, &pool->waker, COMMIT_PERIOD);
}
-static void notify_of_pool_mode_change_to_oods(struct pool *pool);
-
/*
* We're holding onto IO to allow userland time to react. After the
* timeout either the pool will have been resized (and thus back in
@@ -2365,7 +2392,7 @@ static void do_no_space_timeout(struct work_struct *ws)
if (get_pool_mode(pool) == PM_OUT_OF_DATA_SPACE && !pool->pf.error_if_no_space) {
pool->pf.error_if_no_space = true;
- notify_of_pool_mode_change_to_oods(pool);
+ notify_of_pool_mode_change(pool);
error_retry_list_with_code(pool, BLK_STS_NOSPC);
}
}
@@ -2433,26 +2460,6 @@ static void noflush_work(struct thin_c *tc, void (*fn)(struct work_struct *))
/*----------------------------------------------------------------*/
-static enum pool_mode get_pool_mode(struct pool *pool)
-{
- return pool->pf.mode;
-}
-
-static void notify_of_pool_mode_change(struct pool *pool, const char *new_mode)
-{
- dm_table_event(pool->ti->table);
- DMINFO("%s: switching pool to %s mode",
- dm_device_name(pool->pool_md), new_mode);
-}
-
-static void notify_of_pool_mode_change_to_oods(struct pool *pool)
-{
- if (!pool->pf.error_if_no_space)
- notify_of_pool_mode_change(pool, "out-of-data-space (queue IO)");
- else
- notify_of_pool_mode_change(pool, "out-of-data-space (error IO)");
-}
-
static bool passdown_enabled(struct pool_c *pt)
{
return pt->adjusted_pf.discard_passdown;
@@ -2501,8 +2508,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
switch (new_mode) {
case PM_FAIL:
- if (old_mode != new_mode)
- notify_of_pool_mode_change(pool, "failure");
dm_pool_metadata_read_only(pool->pmd);
pool->process_bio = process_bio_fail;
pool->process_discard = process_bio_fail;
@@ -2516,8 +2521,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
case PM_OUT_OF_METADATA_SPACE:
case PM_READ_ONLY:
- if (!is_read_only_pool_mode(old_mode))
- notify_of_pool_mode_change(pool, "read-only");
dm_pool_metadata_read_only(pool->pmd);
pool->process_bio = process_bio_read_only;
pool->process_discard = process_bio_success;
@@ -2538,8 +2541,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
* alarming rate. Adjust your low water mark if you're
* frequently seeing this mode.
*/
- if (old_mode != new_mode)
- notify_of_pool_mode_change_to_oods(pool);
pool->out_of_data_space = true;
pool->process_bio = process_bio_read_only;
pool->process_discard = process_discard_bio;
@@ -2552,8 +2553,6 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
break;
case PM_WRITE:
- if (old_mode != new_mode)
- notify_of_pool_mode_change(pool, "write");
if (old_mode == PM_OUT_OF_DATA_SPACE)
cancel_delayed_work_sync(&pool->no_space_timeout);
pool->out_of_data_space = false;
@@ -2573,6 +2572,9 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
* doesn't cause an unexpected mode transition on resume.
*/
pt->adjusted_pf.mode = new_mode;
+
+ if (old_mode != new_mode)
+ notify_of_pool_mode_change(pool);
}
static void abort_transaction(struct pool *pool)
Mediatek Preloader is a proprietary embedded boot loader for loading
Little Kernel and Linux into device DRAM.
This boot loader also handle firmware update. Mediatek Preloader will be
enumerated as a virtual COM port when the device is connected to Windows
or Linux OS via CDC-ACM class driver. When the USB enumeration has been
done, Mediatek Preloader will send out handshake command "READY" to PC
actively instead of waiting command from the download tool.
Since Linux 4.12, the commit "tty: reset termios state on device
registration" (93857edd9829e144acb6c7e72d593f6e01aead66) causes Mediatek
Preloader receiving some abnoraml command like "READYXX" as it sent.
This will be recognized as an incorrect response. The behavior change
also causes the download handshake fail.
By disabling the ECHO termios flag could avoid this problem. However, it
cannot be done by user space configuration when download tool open
/dev/ttyACM0. This is because the device running Mediatek Preloader will
send handshake command "READY" immediately once the CDC-ACM driver is
ready.
This patch wants to fix above problem by introducing "DISABLE_ECHO"
property in driver_info. When Mediatek Preloader is connected, the
CDC-ACM driver could disable ECHO flag in termios to avoid the problem.
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
---
Changes for v2:
- Move quirks testing of DISABLE_ECHO flag into acm_tty_install().
- Change quirks testing into bitwise comparison.
Changes for v3:
- Replace quirks testing from init_termios to tty->termios.
- Remove parenthesis for ECHO flag.
Changes for v4:
- Drop quirks varible to simplify the patch.
- Move termios operation right after the driver_data has been installed.
- Write general style comment for suppressing initial echoing.
drivers/usb/class/cdc-acm.c | 12 +++++++++++-
drivers/usb/class/cdc-acm.h | 1 +
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 1b68fed..ab8609e 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -583,6 +583,13 @@ static int acm_tty_install(struct tty_driver *driver, struct tty_struct *tty)
tty->driver_data = acm;
+ /*
+ * Suppress initial echoing for some devices which might send data
+ * immediately after acm driver has been installed.
+ */
+ if (acm->quirks & DISABLE_ECHO)
+ tty->termios.c_lflag &= ~ECHO;
+
return 0;
error_init_termios:
@@ -1655,7 +1662,10 @@ static int acm_pre_reset(struct usb_interface *intf)
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
{ USB_DEVICE(0x0e8d, 0x0003), /* FIREFLY, MediaTek Inc; andrey.arapov(a)gmail.com */
- .driver_info = NO_UNION_NORMAL, /* has no union descriptor */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
+ },
+ { USB_DEVICE(0x0e8d, 0x2000), /* FIREFLY, MediaTek Inc; Preloader */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
},
{ USB_DEVICE(0x0e8d, 0x3329), /* MediaTek Inc GPS */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index ca06b20..515aad0 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -140,3 +140,4 @@ struct acm {
#define QUIRK_CONTROL_LINE_STATE BIT(6)
#define CLEAR_HALT_CONDITIONS BIT(7)
#define SEND_ZERO_PACKET BIT(8)
+#define DISABLE_ECHO BIT(9)
--
1.9.1
Mediatek Preloader is a proprietary embedded boot loader for loading
Little Kernel and Linux into device DRAM.
This boot loader also handle firmware update. Mediatek Preloader will be
enumerated as a virtual COM port when the device is connected to Windows
or Linux OS via CDC-ACM class driver. When the USB enumeration has been
done, Mediatek Preloader will send out handshake command "READY" to PC
actively instead of waiting command from the download tool.
Since Linux 4.12, the commit "tty: reset termios state on device
registration" (93857edd9829e144acb6c7e72d593f6e01aead66) causes Mediatek
Preloader receiving some abnoraml command like "READYXX" as it sent.
This will be recognized as an incorrect response. The behavior change
also causes the download handshake fail.
By disabling the ECHO termios flag could avoid this problem. However, it
cannot be done by user space configuration when download tool open
/dev/ttyACM0. This is because the device running Mediatek Preloader will
send handshake command "READY" immediately once the CDC-ACM driver is
ready.
This patch wants to fix above problem by introducing "DISABLE_ECHO"
property in driver_info. When Mediatek Preloader is connected, the
CDC-ACM driver could disable ECHO flag in termios to avoid the problem.
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
---
Changes for v2:
- Move quirks testing of DISABLE_ECHO flag into acm_tty_install().
- Change quirks testing into bitwise comparison.
Changes for v3:
- Replace clear flag operation from driver->init_termios to tty->termios.
- Remove parenthesis of ECHO flag.
drivers/usb/class/cdc-acm.c | 13 ++++++++++++-
drivers/usb/class/cdc-acm.h | 1 +
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 1b68fed..c1b88c3 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -571,6 +571,7 @@ static void acm_softint(struct work_struct *work)
static int acm_tty_install(struct tty_driver *driver, struct tty_struct *tty)
{
struct acm *acm;
+ unsigned long quirks;
int retval;
acm = acm_get_by_minor(tty->index);
@@ -583,6 +584,13 @@ static int acm_tty_install(struct tty_driver *driver, struct tty_struct *tty)
tty->driver_data = acm;
+ /* get normal quirks */
+ quirks = acm->quirks;
+
+ /* handle active handshake triggered by device */
+ if (quirks & DISABLE_ECHO)
+ tty->termios.c_lflag &= ~ECHO;
+
return 0;
error_init_termios:
@@ -1655,7 +1663,10 @@ static int acm_pre_reset(struct usb_interface *intf)
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
{ USB_DEVICE(0x0e8d, 0x0003), /* FIREFLY, MediaTek Inc; andrey.arapov(a)gmail.com */
- .driver_info = NO_UNION_NORMAL, /* has no union descriptor */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
+ },
+ { USB_DEVICE(0x0e8d, 0x2000), /* FIREFLY, MediaTek Inc; Preloader */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
},
{ USB_DEVICE(0x0e8d, 0x3329), /* MediaTek Inc GPS */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index ca06b20..515aad0 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -140,3 +140,4 @@ struct acm {
#define QUIRK_CONTROL_LINE_STATE BIT(6)
#define CLEAR_HALT_CONDITIONS BIT(7)
#define SEND_ZERO_PACKET BIT(8)
+#define DISABLE_ECHO BIT(9)
--
1.9.1
Mediatek Preloader is a proprietary embedded boot loader for loading
Little Kernel and Linux into device DRAM.
This boot loader also handle firmware update. Mediatek Preloader will be
enumerated as a virtual COM port when the device is connected to Windows
or Linux OS via CDC-ACM class driver. When the USB enumeration has been
done, Mediatek Preloader will send out handshake command "READY" to PC
actively instead of waiting command from the download tool.
Since Linux 4.12, the commit "tty: reset termios state on device
registration" (93857edd9829e144acb6c7e72d593f6e01aead66) causes Mediatek
Preloader receiving some abnoraml command like "READYXX" as it sent.
This will be recognized as an incorrect response. The behavior change
also causes the download handshake fail.
By disabling the ECHO termios flag could avoid this problem. However, it
cannot be done by user space configuration when download tool open
/dev/ttyACM0. This is because the device running Mediatek Preloader will
send handshake command "READY" immediately once the CDC-ACM driver is
ready.
This patch wants to fix above problem by introducing "DISABLE_ECHO"
property in driver_info. When Mediatek Preloader is connected, the
CDC-ACM driver could disable ECHO flag in termios to avoid the problem.
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
---
Change for v2:
- Move quirks testing of DISABLE_ECHO flag into acm_tty_install().
- Change quirks testing into bitwise comparison.
drivers/usb/class/cdc-acm.c | 13 ++++++++++++-
drivers/usb/class/cdc-acm.h | 1 +
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 1b68fed..f1a914d 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -571,12 +571,20 @@ static void acm_softint(struct work_struct *work)
static int acm_tty_install(struct tty_driver *driver, struct tty_struct *tty)
{
struct acm *acm;
+ unsigned long quirks;
int retval;
acm = acm_get_by_minor(tty->index);
if (!acm)
return -ENODEV;
+ /* get normal quirks */
+ quirks = acm->quirks;
+
+ /* handle active handshake triggered by device */
+ if (quirks & DISABLE_ECHO)
+ driver->init_termios.c_lflag &= ~(ECHO);
+
retval = tty_standard_install(driver, tty);
if (retval)
goto error_init_termios;
@@ -1655,7 +1663,10 @@ static int acm_pre_reset(struct usb_interface *intf)
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
{ USB_DEVICE(0x0e8d, 0x0003), /* FIREFLY, MediaTek Inc; andrey.arapov(a)gmail.com */
- .driver_info = NO_UNION_NORMAL, /* has no union descriptor */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
+ },
+ { USB_DEVICE(0x0e8d, 0x2000), /* FIREFLY, MediaTek Inc; Preloader */
+ .driver_info = DISABLE_ECHO, /* DISABLE ECHO in termios flag */
},
{ USB_DEVICE(0x0e8d, 0x3329), /* MediaTek Inc GPS */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index ca06b20..515aad0 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -140,3 +140,4 @@ struct acm {
#define QUIRK_CONTROL_LINE_STATE BIT(6)
#define CLEAR_HALT_CONDITIONS BIT(7)
#define SEND_ZERO_PACKET BIT(8)
+#define DISABLE_ECHO BIT(9)
--
1.9.1
From: Long Li <longli(a)microsoft.com>
The current code attempts to pin memory using the largest possible wsize
based on the currect SMB credits. This doesn't cause kernel oops but this is
not optimal as we may pin more pages then actually needed.
Fix this by only pinning what are needed for doing this write I/O.
Signed-off-by: Long Li <longli(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
fs/cifs/file.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3467351..c23bf9d 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2617,11 +2617,13 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
if (rc)
break;
+ cur_len = min_t(const size_t, len, wsize);
+
if (ctx->direct_io) {
ssize_t result;
result = iov_iter_get_pages_alloc(
- from, &pagevec, wsize, &start);
+ from, &pagevec, cur_len, &start);
if (result < 0) {
cifs_dbg(VFS,
"direct_writev couldn't get user pages "
--
2.7.4
Hi folks,
I recently started seeing spurious test failures with rbtree tests,
resulting in boot delays and random "hung task" warnings.
The problems have been fixed upstream with the following patches.
v4.14.y:
0b548e33e6cb lib/rbtree-test: lower default params
v4.4.y, v4.9.y:
a54dae0338b7 lib/interval_tree_test.c: make test options module parameters
c46ecce431eb lib/interval_tree_test.c: allow full tree search
223f8911eace lib/rbtree_test.c: make input module parameters
0b548e33e6cb lib/rbtree-test: lower default params
The first three patches for v4.4.y and v4.9.y are the minimum
set of context patches needed to avoid conflicts when applying
commit 0b548e33e6cb.
I tested all kernel versions with the patches applied and rbtree
testing enabled to ensure that no new problems are introduced.
Please consider applying those patches to the respective releases.
Thanks,
Guenter
From: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
NullFunc packets should never be duplicate just like
QoS-NullFunc packets.
We saw a client that enters / exits power save with
NullFunc frames (and not with QoS-NullFunc) despite the
fact that the association supports HT.
This specific client also re-uses a non-zero sequence number
for different NullFunc frames.
At some point, the client had to send a retransmission of
the NullFunc frame and we dropped it, leading to a
misalignment in the power save state.
Fix this by never consider a NullFunc frame as duplicate,
just like we do for QoS NullFunc frames.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=201449
CC: <stable(a)vger.kernel.org>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
net/mac80211/rx.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 2394008f82b9..60d179bf2585 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1404,6 +1404,7 @@ ieee80211_rx_h_check_dup(struct ieee80211_rx_data *rx)
return RX_CONTINUE;
if (ieee80211_is_ctl(hdr->frame_control) ||
+ ieee80211_is_nullfunc(hdr->frame_control) ||
ieee80211_is_qos_nullfunc(hdr->frame_control) ||
is_multicast_ether_addr(hdr->addr1))
return RX_CONTINUE;
--
2.19.2
[ Upstream commit 64c3f648c25d108f346fdc96c15180c6b7d250e9 ]
Once in a while I see build errors similar to the following
when building images from a clean tree.
Building powerpc:virtex-ml507:44x/virtex5_defconfig ... failed
------------
Error log:
arch/powerpc/boot/treeboot-akebono.c:37:20: fatal error:
libfdt.h: No such file or directory
Building powerpc:bamboo:smpdev:44x/bamboo_defconfig ... failed
------------
Error log:
arch/powerpc/boot/treeboot-akebono.c:37:20: fatal error:
libfdt.h: No such file or directory
arch/powerpc/boot/treeboot-currituck.c:35:20: fatal error:
libfdt.h: No such file or directory
Rebuilds will succeed.
Turns out that several source files in arch/powerpc/boot/ include
libfdt.h, but Makefile dependencies are incomplete. Let's fix that.
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[groeck: Backport to v4.4.y]
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
---
For some reason I see the build error fixed with this patch more often
lately. It would be great if you can apply it to v3.18.y and v4.4.y.
Tested on both v3.18.y and v4.4.y.
arch/powerpc/boot/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 99e4487248ff..57003d1bd243 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -70,7 +70,8 @@ $(addprefix $(obj)/,$(zlib) cuboot-c2k.o gunzip_util.o main.o): \
libfdt := fdt.c fdt_ro.c fdt_wip.c fdt_sw.c fdt_rw.c fdt_strerror.c
libfdtheader := fdt.h libfdt.h libfdt_internal.h
-$(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o epapr.o): \
+$(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o epapr.o \
+ treeboot-akebono.o treeboot-currituck.o treeboot-iss4xx.o): \
$(addprefix $(obj)/,$(libfdtheader))
src-wlib-y := string.S crt0.S crtsavres.S stdio.c main.c \
--
2.7.4
Please pick this commit for 4.14 and older stable branches:
commit 8e7df2b5b7f245c9bd11064712db5cb69044a362
Author: Ingo Molnar <mingo(a)kernel.org>
Date: Mon Nov 13 07:15:41 2017 +0100
timer/debug: Change /proc/timer_list from 0444 to 0400
In older kernel versions this file makes it far too easy to exploit
arbitrary-write bugs. It's possible to hide the pointers from
unprivileged users by setting the kernel.kptr_restrict sysctl, but that
wasn't done by default.
(Upstream commits c1eba5bcb643 "timer: Pass timer_list pointer to
callbacks unconditionally" and ad67b74d2469 "printk: hash addresses
printed with %p" provide more general mitigations, but don't seem to be
suitable for stable.)
Ben.
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
The patch titled
Subject: fork,memcg: fix crash in free_thread_stack on memcg charge fail
has been added to the -mm tree. Its filename is
forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/forkmemcg-fix-crash-in-free_thread…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/forkmemcg-fix-crash-in-free_thread…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rik van Riel <riel(a)surriel.com>
Subject: fork,memcg: fix crash in free_thread_stack on memcg charge fail
Changeset 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
will result in fork failing if allocating a kernel stack for a task
in dup_task_struct exceeds the kernel memory allowance for that cgroup.
Unfortunately, it also results in a crash.
This is due to the code jumping to free_stack and calling free_thread_stack
when the memcg kernel stack charge fails, but without tsk->stack pointing
at the freshly allocated stack.
This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:
#5 [ffffc900244efc88] die at ffffffff8101f0ab
#6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
#7 [ffffc900244efce0] general_protection at ffffffff818ff082
[exception RIP: llist_add_batch+7]
RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000
RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a
RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0
R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
#9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948
RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421
RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040
R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
The simplest fix is to assign tsk->stack right where it is allocated.
Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/kernel/fork.c~forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail
+++ a/kernel/fork.c
@@ -240,8 +240,10 @@ static unsigned long *alloc_thread_stack
* free_thread_stack() can be called in interrupt context,
* so cache the vm_struct.
*/
- if (stack)
+ if (stack) {
tsk->stack_vm_area = find_vm_area(stack);
+ tsk->stack = stack;
+ }
return stack;
#else
struct page *page = alloc_pages_node(node, THREADINFO_GFP,
@@ -288,7 +290,10 @@ static struct kmem_cache *thread_stack_c
static unsigned long *alloc_thread_stack_node(struct task_struct *tsk,
int node)
{
- return kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node);
+ unsigned long *stack;
+ stack = kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node);
+ tsk->stack = stack;
+ return stack;
}
static void free_thread_stack(struct task_struct *tsk)
_
Patches currently in -mm which might be from riel(a)surriel.com are
forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail.patch
The patch titled
Subject: proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
has been removed from the -mm tree. Its filename was
proc-sysctl-dont-return-enomem-on-lookup-when-a-table-is-unregistering.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ivan Delalande <colona(a)arista.com>
Subject: proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
proc_sys_lookup can fail with ENOMEM instead of ENOENT when the
corresponding sysctl table is being unregistered. In our case we see this
upon opening /proc/sys/net/*/conf files while network interfaces are being
deleted, which confuses our configuration daemon.
The problem was successfully reproduced and this fix tested on v4.9.122
and v4.20-rc6.
Link: http://lkml.kernel.org/r/20181213232052.GA1513@visor
Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Signed-off-by: Ivan Delalande <colona(a)arista.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/proc_sysctl.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--- a/fs/proc/proc_sysctl.c~proc-sysctl-dont-return-enomem-on-lookup-when-a-table-is-unregistering
+++ a/fs/proc/proc_sysctl.c
@@ -464,7 +464,7 @@ static struct inode *proc_sys_make_inode
inode = new_inode(sb);
if (!inode)
- goto out;
+ return ERR_PTR(-ENOMEM);
inode->i_ino = get_next_ino();
@@ -474,7 +474,7 @@ static struct inode *proc_sys_make_inode
if (unlikely(head->unregistering)) {
spin_unlock(&sysctl_lock);
iput(inode);
- inode = NULL;
+ inode = ERR_PTR(-ENOENT);
goto out;
}
ei->sysctl = head;
@@ -549,10 +549,11 @@ static struct dentry *proc_sys_lookup(st
goto out;
}
- err = ERR_PTR(-ENOMEM);
inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
- if (!inode)
+ if (IS_ERR(inode)) {
+ err = ERR_CAST(inode);
goto out;
+ }
d_set_d_op(dentry, &proc_sys_dentry_operations);
err = d_splice_alias(inode, dentry);
@@ -685,7 +686,7 @@ static bool proc_sys_fill_cache(struct f
if (d_in_lookup(child)) {
struct dentry *res;
inode = proc_sys_make_inode(dir->d_sb, head, table);
- if (!inode) {
+ if (IS_ERR(inode)) {
d_lookup_done(child);
dput(child);
return false;
_
Patches currently in -mm which might be from colona(a)arista.com are
The patch titled
Subject: scripts/spdxcheck.py: always open files in binary mode
has been removed from the -mm tree. Its filename was
scripts-spdxcheckpy-always-open-files-in-binary-mode.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Thierry Reding <treding(a)nvidia.com>
Subject: scripts/spdxcheck.py: always open files in binary mode
The spdxcheck script currently falls over when confronted with a binary
file (such as Documentation/logo.gif). To avoid that, always open files
in binary mode and decode line-by-line, ignoring encoding errors.
One tricky case is when piping data into the script and reading it from
standard input. By default, standard input will be opened in text mode,
so we need to reopen it in binary mode.
The breakage only happens with python3 and results in a
UnicodeDecodeError (according to Uwe).
Link: http://lkml.kernel.org/r/20181212131210.28024-1-thierry.reding@gmail.com
Fixes: 6f4d29df66ac ("scripts/spdxcheck.py: make python3 compliant")
Signed-off-by: Thierry Reding <treding(a)nvidia.com>
Reviewed-by: Jeremy Cline <jcline(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Joe Perches <joe(a)perches.com>
Cc: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/spdxcheck.py | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/scripts/spdxcheck.py~scripts-spdxcheckpy-always-open-files-in-binary-mode
+++ a/scripts/spdxcheck.py
@@ -168,6 +168,7 @@ class id_parser(object):
self.curline = 0
try:
for line in fd:
+ line = line.decode(locale.getpreferredencoding(False), errors='ignore')
self.curline += 1
if self.curline > maxlines:
break
@@ -249,12 +250,13 @@ if __name__ == '__main__':
try:
if len(args.path) and args.path[0] == '-':
- parser.parse_lines(sys.stdin, args.maxlines, '-')
+ stdin = os.fdopen(sys.stdin.fileno(), 'rb')
+ parser.parse_lines(stdin, args.maxlines, '-')
else:
if args.path:
for p in args.path:
if os.path.isfile(p):
- parser.parse_lines(open(p), args.maxlines, p)
+ parser.parse_lines(open(p, 'rb'), args.maxlines, p)
elif os.path.isdir(p):
scan_git_subtree(repo.head.reference.commit.tree, p)
else:
_
Patches currently in -mm which might be from treding(a)nvidia.com are
scripts-add-spdxcheckpy-self-test.patch
The patch titled
Subject: userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
has been removed from the -mm tree. Its filename was
userfaultfd-check-vm_maywrite-was-set-after-verifying-the-uffd-is-registered.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
could trigger an harmless false positive WARN_ON. Check the vma is
already registered before checking VM_MAYWRITE to shut off the false
positive warning.
Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com
Cc: <stable(a)vger.kernel.org>
Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: syzbot+06c7092e7d71218a2c16(a)syzkaller.appspotmail.com
Acked-by: Mike Rapoport <rppt(a)linux.ibm.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/userfaultfd.c~userfaultfd-check-vm_maywrite-was-set-after-verifying-the-uffd-is-registered
+++ a/fs/userfaultfd.c
@@ -1566,7 +1566,6 @@ static int userfaultfd_unregister(struct
cond_resched();
BUG_ON(!vma_can_userfault(vma));
- WARN_ON(!(vma->vm_flags & VM_MAYWRITE));
/*
* Nothing to do: this vma is already registered into this
@@ -1575,6 +1574,8 @@ static int userfaultfd_unregister(struct
if (!vma->vm_userfaultfd_ctx.ctx)
goto skip;
+ WARN_ON(!(vma->vm_flags & VM_MAYWRITE));
+
if (vma->vm_start > start)
start = vma->vm_start;
vma_end = min(end, vma->vm_end);
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
The patch titled
Subject: fs/iomap.c: get/put the page in iomap_page_create/release()
has been removed from the -mm tree. Its filename was
iomap-get-put-the-page-in-iomap_page_create-release.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Piotr Jaroszynski <pjaroszynski(a)nvidia.com>
Subject: fs/iomap.c: get/put the page in iomap_page_create/release()
migrate_page_move_mapping() expects pages with private data set to have a
page_count elevated by 1. This is what used to happen for xfs through the
buffer_heads code before the switch to iomap in commit 82cb14175e7d ("xfs:
add support for sub-pagesize writeback without buffer_heads"). Not having
the count elevated causes move_pages() to fail on memory mapped files
coming from xfs.
Make iomap compatible with the migrate_page_move_mapping() assumption by
elevating the page count as part of iomap_page_create() and lowering it in
iomap_page_release().
It causes the move_pages() syscall to misbehave on memory mapped files
from xfs. It does not not move any pages, which I suppose is "just" a
perf issue, but it also ends up returning a positive number which is
out of spec for the syscall. Talking to Michal Hocko, it sounds like
returning positive numbers might be a necessary update to move_pages()
anyway though
(https://lkml.kernel.org/r/20181116114955.GJ14706@dhcp22.suse.cz).
I only hit this in tests that verify that move_pages() actually moved
the pages. The test also got confused by the positive return from
move_pages() (it got treated as a success as positive numbers were not
expected and not handled) making it a bit harder to track down what's
going on.
Link: http://lkml.kernel.org/r/20181115184140.1388751-1-pjaroszynski@nvidia.com
Fixes: 82cb14175e7d ("xfs: add support for sub-pagesize writeback without buffer_heads")
Signed-off-by: Piotr Jaroszynski <pjaroszynski(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Darrick J. Wong <darrick.wong(a)oracle.com>
Cc: Brian Foster <bfoster(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/iomap.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/fs/iomap.c~iomap-get-put-the-page-in-iomap_page_create-release
+++ a/fs/iomap.c
@@ -116,6 +116,12 @@ iomap_page_create(struct inode *inode, s
atomic_set(&iop->read_count, 0);
atomic_set(&iop->write_count, 0);
bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+
+ /*
+ * migrate_page_move_mapping() assumes that pages with private data have
+ * their count elevated by 1.
+ */
+ get_page(page);
set_page_private(page, (unsigned long)iop);
SetPagePrivate(page);
return iop;
@@ -132,6 +138,7 @@ iomap_page_release(struct page *page)
WARN_ON_ONCE(atomic_read(&iop->write_count));
ClearPagePrivate(page);
set_page_private(page, 0);
+ put_page(page);
kfree(iop);
}
_
Patches currently in -mm which might be from pjaroszynski(a)nvidia.com are
This is a note to let you know that I've just added the patch titled
staging: bcm2835-audio: double free in init error path
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 649496b603000135683ee76d7ea499456617bf17 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Mon, 17 Dec 2018 10:08:54 +0300
Subject: staging: bcm2835-audio: double free in init error path
We free instance here and in the caller. It should be only the caller
which handles it.
Fixes: d7ca3a71545b ("staging: bcm2835-audio: Operate non-atomic PCM ops")
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Reviewed-by: Takashi Iwai <tiwai(a)suse.de>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c
index 0db412fd7c55..c0debdbce26c 100644
--- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c
+++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c
@@ -138,7 +138,6 @@ vc_vchi_audio_init(VCHI_INSTANCE_T vchi_instance,
dev_err(instance->dev,
"failed to open VCHI service connection (status=%d)\n",
status);
- kfree(instance);
return -EPERM;
}
--
2.20.1
This is a note to let you know that I've just added the patch titled
usb: roles: Add a description for the class to Kconfig
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From c3788cd9963eb2e77de3c24142fb7c67b61f1a26 Mon Sep 17 00:00:00 2001
From: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Date: Wed, 12 Dec 2018 20:13:55 +0300
Subject: usb: roles: Add a description for the class to Kconfig
That makes the USB role switch support option visible and
selectable for the user. The class driver is also moved to
drivers/usb/roles/ directory.
This will fix an issue that we have with the Intel USB role
switch driver on systems that don't have USB Type-C connectors:
Intel USB role switch driver depends on the USB role switch
class as it should, but since there was no way for the user
to enable the USB role switch class, there was also no way
to select that driver. USB Type-C drivers select the USB
role switch class which makes the Intel USB role switch
driver available and therefore hides the problem.
So in practice Intel USB role switch driver was depending on
USB Type-C drivers.
Fixes: f6fb9ec02be1 ("usb: roles: Add Intel xHCI USB role switch driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/Kconfig | 4 ----
drivers/usb/common/Makefile | 1 -
drivers/usb/roles/Kconfig | 13 +++++++++++++
drivers/usb/roles/Makefile | 4 +++-
drivers/usb/{common/roles.c => roles/class.c} | 0
5 files changed, 16 insertions(+), 6 deletions(-)
rename drivers/usb/{common/roles.c => roles/class.c} (100%)
diff --git a/drivers/usb/Kconfig b/drivers/usb/Kconfig
index 987fc5ba6321..70e6c956c23c 100644
--- a/drivers/usb/Kconfig
+++ b/drivers/usb/Kconfig
@@ -205,8 +205,4 @@ config USB_ULPI_BUS
To compile this driver as a module, choose M here: the module will
be called ulpi.
-config USB_ROLE_SWITCH
- tristate
- select USB_COMMON
-
endif # USB_SUPPORT
diff --git a/drivers/usb/common/Makefile b/drivers/usb/common/Makefile
index fb4d5ef4165c..0a7c45e85481 100644
--- a/drivers/usb/common/Makefile
+++ b/drivers/usb/common/Makefile
@@ -9,4 +9,3 @@ usb-common-$(CONFIG_USB_LED_TRIG) += led.o
obj-$(CONFIG_USB_OTG_FSM) += usb-otg-fsm.o
obj-$(CONFIG_USB_ULPI_BUS) += ulpi.o
-obj-$(CONFIG_USB_ROLE_SWITCH) += roles.o
diff --git a/drivers/usb/roles/Kconfig b/drivers/usb/roles/Kconfig
index f5a5e6f79f1b..e4194ac94510 100644
--- a/drivers/usb/roles/Kconfig
+++ b/drivers/usb/roles/Kconfig
@@ -1,3 +1,16 @@
+config USB_ROLE_SWITCH
+ tristate "USB Role Switch Support"
+ help
+ USB Role Switch is a device that can select the USB role - host or
+ device - for a USB port (connector). In most cases dual-role capable
+ USB controller will also represent the switch, but on some platforms
+ multiplexer/demultiplexer switch is used to route the data lines on
+ the USB connector between separate USB host and device controllers.
+
+ Say Y here if your USB connectors support both device and host roles.
+ To compile the driver as module, choose M here: the module will be
+ called roles.ko.
+
if USB_ROLE_SWITCH
config USB_ROLES_INTEL_XHCI
diff --git a/drivers/usb/roles/Makefile b/drivers/usb/roles/Makefile
index e44b179ba275..c02873206fc1 100644
--- a/drivers/usb/roles/Makefile
+++ b/drivers/usb/roles/Makefile
@@ -1 +1,3 @@
-obj-$(CONFIG_USB_ROLES_INTEL_XHCI) += intel-xhci-usb-role-switch.o
+obj-$(CONFIG_USB_ROLE_SWITCH) += roles.o
+roles-y := class.o
+obj-$(CONFIG_USB_ROLES_INTEL_XHCI) += intel-xhci-usb-role-switch.o
diff --git a/drivers/usb/common/roles.c b/drivers/usb/roles/class.c
similarity index 100%
rename from drivers/usb/common/roles.c
rename to drivers/usb/roles/class.c
--
2.20.1
On 12/14/18 7:55 AM, Ross Lagerwall wrote:
> If pcistub_init_device fails, the release function will be called with
> dev_data set to NULL. Check it before using it to avoid a NULL pointer
> dereference.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall(a)citrix.com>
I think this should go to stable trees too (copying them)
Applied to for-linus-4.21
-boris
> ---
> drivers/xen/xen-pciback/pci_stub.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
> index 59661db144e5..097410a7cdb7 100644
> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -106,7 +106,8 @@ static void pcistub_device_release(struct kref *kref)
> * is called from "unbind" which takes a device_lock mutex.
> */
> __pci_reset_function_locked(dev);
> - if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
> + if (dev_data &&
> + pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
> dev_info(&dev->dev, "Could not reload PCI state\n");
> else
> pci_restore_state(dev);
This is a note to let you know that I've just added the patch titled
Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again"
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 3c9dc275dba1124c1e16e7037226038451286813 Mon Sep 17 00:00:00 2001
From: Paul Burton <paul.burton(a)mips.com>
Date: Sun, 16 Dec 2018 20:10:01 +0000
Subject: Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again"
Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode
again") makes a change to FIFO clearing code which its commit message
suggests was intended to be specific to use with RS485 mode, however:
1) The change made does not just affect __do_stop_tx_rs485(), it also
affects other uses of serial8250_clear_fifos() including paths for
starting up, shutting down or auto-configuring a port regardless of
whether it's an RS485 port or not.
2) It makes the assumption that resetting the FIFOs is a no-op when
FIFOs are disabled, and as such it checks for this case & explicitly
avoids setting the FIFO reset bits when the FIFO enable bit is
clear. A reading of the PC16550D manual would suggest that this is
OK since the FIFO should automatically be reset if it is later
enabled, but we support many 16550-compatible devices and have never
required this auto-reset behaviour for at least the whole git era.
Starting to rely on it now seems risky, offers no benefit, and
indeed breaks at least the Ingenic JZ4780's UARTs which reads
garbage when the RX FIFO is enabled if we don't explicitly reset it.
3) By only resetting the FIFOs if they're enabled, the behaviour of
serial8250_do_startup() during boot now depends on what the value of
FCR is before the 8250 driver is probed. This in itself seems
questionable and leaves us with FCR=0 & no FIFO reset if the UART
was used by 8250_early, otherwise it depends upon what the
bootloader left behind.
4) Although the naming of serial8250_clear_fifos() may be unclear, it
is clear that callers of it expect that it will disable FIFOs. Both
serial8250_do_startup() & serial8250_do_shutdown() contain comments
to that effect, and other callers explicitly re-enable the FIFOs
after calling serial8250_clear_fifos(). The premise of that patch
that disabling the FIFOs is incorrect therefore seems wrong.
For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix
clearing FIFOs in RS485 mode again").
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again").
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Daniel Jedrychowski <avistel(a)gmail.com>
Cc: Marek Vasut <marex(a)denx.de>
Cc: linux-mips(a)vger.kernel.org
Cc: linux-serial(a)vger.kernel.org
Cc: stable <stable(a)vger.kernel.org> # 4.10+
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_port.c | 29 +++++------------------------
1 file changed, 5 insertions(+), 24 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index f776b3eafb96..3f779d25ec0c 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -552,30 +552,11 @@ static unsigned int serial_icr_read(struct uart_8250_port *up, int offset)
*/
static void serial8250_clear_fifos(struct uart_8250_port *p)
{
- unsigned char fcr;
- unsigned char clr_mask = UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT;
-
if (p->capabilities & UART_CAP_FIFO) {
- /*
- * Make sure to avoid changing FCR[7:3] and ENABLE_FIFO bits.
- * In case ENABLE_FIFO is not set, there is nothing to flush
- * so just return. Furthermore, on certain implementations of
- * the 8250 core, the FCR[7:3] bits may only be changed under
- * specific conditions and changing them if those conditions
- * are not met can have nasty side effects. One such core is
- * the 8250-omap present in TI AM335x.
- */
- fcr = serial_in(p, UART_FCR);
-
- /* FIFO is not enabled, there's nothing to clear. */
- if (!(fcr & UART_FCR_ENABLE_FIFO))
- return;
-
- fcr |= clr_mask;
- serial_out(p, UART_FCR, fcr);
-
- fcr &= ~clr_mask;
- serial_out(p, UART_FCR, fcr);
+ serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO);
+ serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO |
+ UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT);
+ serial_out(p, UART_FCR, 0);
}
}
@@ -1467,7 +1448,7 @@ static void __do_stop_tx_rs485(struct uart_8250_port *p)
* Enable previously disabled RX interrupts.
*/
if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
- serial8250_clear_fifos(p);
+ serial8250_clear_and_reinit_fifos(p);
p->ier |= UART_IER_RLSI | UART_IER_RDI;
serial_port_out(&p->port, UART_IER, p->ier);
--
2.20.1
This is a note to let you know that I've just added the patch titled
USB: xhci: fix 'broken_suspend' placement in struct xchi_hcd
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 2419f30a4a4fcaa5f35111563b4c61f1b2b26841 Mon Sep 17 00:00:00 2001
From: Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
Date: Mon, 17 Dec 2018 14:37:40 +0100
Subject: USB: xhci: fix 'broken_suspend' placement in struct xchi_hcd
As commented in the struct's definition there shouldn't be anything
underneath its 'priv[0]' member as it would break some macros.
The patch converts the broken_suspend into a bit-field and relocates it
next to to the rest of bit-fields.
Fixes: a7d57abcc8a5 ("xhci: workaround CSS timeout on AMD SNPS 3.0 xHC")
Reported-by: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
Acked-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index c3515bad5dbb..011dd45f8718 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1863,6 +1863,8 @@ struct xhci_hcd {
unsigned sw_lpm_support:1;
/* support xHCI 1.0 spec USB2 hardware LPM */
unsigned hw_lpm_support:1;
+ /* Broken Suspend flag for SNPS Suspend resume issue */
+ unsigned broken_suspend:1;
/* cached usb2 extened protocol capabilites */
u32 *ext_caps;
unsigned int num_ext_caps;
@@ -1880,8 +1882,6 @@ struct xhci_hcd {
void *dbc;
/* platform-specific data -- must come last */
unsigned long priv[0] __aligned(sizeof(s64));
- /* Broken Suspend flag for SNPS Suspend resume issue */
- u8 broken_suspend;
};
/* Platform specific overrides to generic XHCI hc_driver ops */
--
2.20.1
The gpio IP on Armada 370 at offset 0x18180 has neither a clk nor pwm
registers. So there is no need for a clk as the pwm isn't used anyhow.
So only check for the clk in the presence of the pwm registers. This fixes
a failure to probe the gpio driver for the above mentioned gpio device.
Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
---
drivers/gpio/gpio-mvebu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c
index 6e02148c208b..adc768f908f1 100644
--- a/drivers/gpio/gpio-mvebu.c
+++ b/drivers/gpio/gpio-mvebu.c
@@ -773,9 +773,6 @@ static int mvebu_pwm_probe(struct platform_device *pdev,
"marvell,armada-370-gpio"))
return 0;
- if (IS_ERR(mvchip->clk))
- return PTR_ERR(mvchip->clk);
-
/*
* There are only two sets of PWM configuration registers for
* all the GPIO lines on those SoCs which this driver reserves
@@ -786,6 +783,9 @@ static int mvebu_pwm_probe(struct platform_device *pdev,
if (!res)
return 0;
+ if (IS_ERR(mvchip->clk))
+ return PTR_ERR(mvchip->clk);
+
/*
* Use set A for lines of GPIO chip with id 0, B for GPIO chip
* with id 1. Don't allow further GPIO chips to be used for PWM.
--
2.19.2
When jffs2 has to retry reading xattrs we need to reset
the buffer pointer. Otherwise we return old xattrs from the
previous iteration which leads to a inconsistency between
the number of bytes we return and the real list size.
Cc: <stable(a)vger.kernel.org>
Cc: Andreas Gruenbacher <agruenba(a)redhat.com>
Fixes: 764a5c6b1fa4 ("xattr handlers: Simplify list operation")
Signed-off-by: Richard Weinberger <richard(a)nod.at>
---
Andreas,
since you maintain the attr package too, I report it right here. :-)
This jffs2 bug lead to a crash in attr_list().
for() will loop to crash when there is no trailing \0 in the
list of xattrs.
for (l = lbuf; l != lbuf + length; l = strchr(l, '\0') + 1) {
if (api_unconvert(name, l, flags))
continue;
...
}
I suggest changing the loop condition to something like l < lbuf + length.
Thanks,
//richard
---
fs/jffs2/xattr.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/jffs2/xattr.c b/fs/jffs2/xattr.c
index da3e18503c65..0cb322eb9516 100644
--- a/fs/jffs2/xattr.c
+++ b/fs/jffs2/xattr.c
@@ -967,6 +967,7 @@ ssize_t jffs2_listxattr(struct dentry *dentry, char *buffer, size_t size)
struct jffs2_xattr_ref *ref, **pref;
struct jffs2_xattr_datum *xd;
const struct xattr_handler *xhandle;
+ char *orig_buffer = buffer;
const char *prefix;
ssize_t prefix_len, len, rc;
int retry = 0;
@@ -977,6 +978,7 @@ ssize_t jffs2_listxattr(struct dentry *dentry, char *buffer, size_t size)
down_read(&c->xattr_sem);
retry:
+ buffer = orig_buffer;
len = 0;
for (ref=ic->xref, pref=&ic->xref; ref; pref=&ref->next, ref=ref->next) {
BUG_ON(ref->ic != ic);
--
2.20.0
This is a note to let you know that I've just added the patch titled
staging: bcm2835-audio: double free in init error path
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 649496b603000135683ee76d7ea499456617bf17 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Mon, 17 Dec 2018 10:08:54 +0300
Subject: staging: bcm2835-audio: double free in init error path
We free instance here and in the caller. It should be only the caller
which handles it.
Fixes: d7ca3a71545b ("staging: bcm2835-audio: Operate non-atomic PCM ops")
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Reviewed-by: Takashi Iwai <tiwai(a)suse.de>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c
index 0db412fd7c55..c0debdbce26c 100644
--- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c
+++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c
@@ -138,7 +138,6 @@ vc_vchi_audio_init(VCHI_INSTANCE_T vchi_instance,
dev_err(instance->dev,
"failed to open VCHI service connection (status=%d)\n",
status);
- kfree(instance);
return -EPERM;
}
--
2.20.1
From: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
Old firmware versions don't support this command. Sending it
to any firmware before -41.ucode will crash the firmware.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=201975
Fixes: 66e839030fd6 ("iwlwifi: fix wrong WGDS_WIFI_DATA_SIZE")
CC: <stable(a)vger.kernel.org> #4.19+
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 2ba890445c35..1689bead1b4f 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -881,6 +881,15 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
int ret, i, j;
u16 cmd_wide_id = WIDE_ID(PHY_OPS_GROUP, GEO_TX_POWER_LIMIT);
+ /*
+ * This command is not supported on earlier firmware versions.
+ * Unfortunately, we don't have a TLV API flag to rely on, so
+ * rely on the major version which is in the first byte of
+ * ucode_ver.
+ */
+ if (IWL_UCODE_SERIAL(mvm->fw->ucode_ver) < 41)
+ return 0;
+
ret = iwl_mvm_sar_get_wgds_table(mvm);
if (ret < 0) {
IWL_DEBUG_RADIO(mvm,
--
2.19.2
This is a note to let you know that I've just added the patch titled
usb: roles: Add a description for the class to Kconfig
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From c3788cd9963eb2e77de3c24142fb7c67b61f1a26 Mon Sep 17 00:00:00 2001
From: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Date: Wed, 12 Dec 2018 20:13:55 +0300
Subject: usb: roles: Add a description for the class to Kconfig
That makes the USB role switch support option visible and
selectable for the user. The class driver is also moved to
drivers/usb/roles/ directory.
This will fix an issue that we have with the Intel USB role
switch driver on systems that don't have USB Type-C connectors:
Intel USB role switch driver depends on the USB role switch
class as it should, but since there was no way for the user
to enable the USB role switch class, there was also no way
to select that driver. USB Type-C drivers select the USB
role switch class which makes the Intel USB role switch
driver available and therefore hides the problem.
So in practice Intel USB role switch driver was depending on
USB Type-C drivers.
Fixes: f6fb9ec02be1 ("usb: roles: Add Intel xHCI USB role switch driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/Kconfig | 4 ----
drivers/usb/common/Makefile | 1 -
drivers/usb/roles/Kconfig | 13 +++++++++++++
drivers/usb/roles/Makefile | 4 +++-
drivers/usb/{common/roles.c => roles/class.c} | 0
5 files changed, 16 insertions(+), 6 deletions(-)
rename drivers/usb/{common/roles.c => roles/class.c} (100%)
diff --git a/drivers/usb/Kconfig b/drivers/usb/Kconfig
index 987fc5ba6321..70e6c956c23c 100644
--- a/drivers/usb/Kconfig
+++ b/drivers/usb/Kconfig
@@ -205,8 +205,4 @@ config USB_ULPI_BUS
To compile this driver as a module, choose M here: the module will
be called ulpi.
-config USB_ROLE_SWITCH
- tristate
- select USB_COMMON
-
endif # USB_SUPPORT
diff --git a/drivers/usb/common/Makefile b/drivers/usb/common/Makefile
index fb4d5ef4165c..0a7c45e85481 100644
--- a/drivers/usb/common/Makefile
+++ b/drivers/usb/common/Makefile
@@ -9,4 +9,3 @@ usb-common-$(CONFIG_USB_LED_TRIG) += led.o
obj-$(CONFIG_USB_OTG_FSM) += usb-otg-fsm.o
obj-$(CONFIG_USB_ULPI_BUS) += ulpi.o
-obj-$(CONFIG_USB_ROLE_SWITCH) += roles.o
diff --git a/drivers/usb/roles/Kconfig b/drivers/usb/roles/Kconfig
index f5a5e6f79f1b..e4194ac94510 100644
--- a/drivers/usb/roles/Kconfig
+++ b/drivers/usb/roles/Kconfig
@@ -1,3 +1,16 @@
+config USB_ROLE_SWITCH
+ tristate "USB Role Switch Support"
+ help
+ USB Role Switch is a device that can select the USB role - host or
+ device - for a USB port (connector). In most cases dual-role capable
+ USB controller will also represent the switch, but on some platforms
+ multiplexer/demultiplexer switch is used to route the data lines on
+ the USB connector between separate USB host and device controllers.
+
+ Say Y here if your USB connectors support both device and host roles.
+ To compile the driver as module, choose M here: the module will be
+ called roles.ko.
+
if USB_ROLE_SWITCH
config USB_ROLES_INTEL_XHCI
diff --git a/drivers/usb/roles/Makefile b/drivers/usb/roles/Makefile
index e44b179ba275..c02873206fc1 100644
--- a/drivers/usb/roles/Makefile
+++ b/drivers/usb/roles/Makefile
@@ -1 +1,3 @@
-obj-$(CONFIG_USB_ROLES_INTEL_XHCI) += intel-xhci-usb-role-switch.o
+obj-$(CONFIG_USB_ROLE_SWITCH) += roles.o
+roles-y := class.o
+obj-$(CONFIG_USB_ROLES_INTEL_XHCI) += intel-xhci-usb-role-switch.o
diff --git a/drivers/usb/common/roles.c b/drivers/usb/roles/class.c
similarity index 100%
rename from drivers/usb/common/roles.c
rename to drivers/usb/roles/class.c
--
2.20.1
Hi,
I'm not sure what the connection is exactly but commit dcd51305cd41
("ALSA: hda/realtek - fix the pop noise on headphone for lenovo
laptops") broke the mute button LED on my Lenovo X1 Extreme laptop.
I don't have any more info frankly and don't even know where to start
debugging. Other LEDs work fine. This mute button LED never had any
entry in /sys/class/leds, I'm not sure how it's handled.
Let me know if you need more debugging on my part.
Best regards,
Bartosz Golaszewski
After expanding i_mmap_rwsem use for better shared pmd and page fault/
truncation synchronization, remove code that is no longer necessary.
Cc: <stable(a)vger.kernel.org>
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
fs/hugetlbfs/inode.c | 46 +++++++++++++++-----------------------------
mm/hugetlb.c | 21 ++++++++++----------
2 files changed, 25 insertions(+), 42 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3244147fc42b..a9c00c6ef80d 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
* truncation is indicated by end of range being LLONG_MAX
* In this case, we first scan the range and release found pages.
* After releasing pages, hugetlb_unreserve_pages cleans up region/reserv
- * maps and global counts. Page faults can not race with truncation
- * in this routine. hugetlb_no_page() prevents page faults in the
- * truncated range. It checks i_size before allocation, and again after
- * with the page table lock for the page held. The same lock must be
- * acquired to unmap a page.
+ * maps and global counts.
* hole punch is indicated if end is not LLONG_MAX
* In the hole punch case we scan the range and release found pages.
* Only when releasing a page is the associated region/reserv map
* deleted. The region/reserv map for ranges without associated
- * pages are not modified. Page faults can race with hole punch.
- * This is indicated if we find a mapped page.
+ * pages are not modified.
+ *
+ * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent
+ * races with page faults.
+ *
* Note: If the passed end of range value is beyond the end of file, but
* not LLONG_MAX this routine still performs a hole punch operation.
*/
@@ -423,32 +422,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
for (i = 0; i < pagevec_count(&pvec); ++i) {
struct page *page = pvec.pages[i];
- u32 hash;
index = page->index;
- hash = hugetlb_fault_mutex_hash(h, current->mm,
- &pseudo_vma,
- mapping, index, 0);
- mutex_lock(&hugetlb_fault_mutex_table[hash]);
-
/*
- * If page is mapped, it was faulted in after being
- * unmapped in caller. Unmap (again) now after taking
- * the fault mutex. The mutex will prevent faults
- * until we finish removing the page.
- *
- * This race can only happen in the hole punch case.
- * Getting here in a truncate operation is a bug.
+ * A mapped page is impossible as callers should unmap
+ * all references before calling. And, i_mmap_rwsem
+ * prevents the creation of additional mappings.
*/
- if (unlikely(page_mapped(page))) {
- BUG_ON(truncate_op);
-
- i_mmap_lock_write(mapping);
- hugetlb_vmdelete_list(&mapping->i_mmap,
- index * pages_per_huge_page(h),
- (index + 1) * pages_per_huge_page(h));
- i_mmap_unlock_write(mapping);
- }
+ VM_BUG_ON(page_mapped(page));
lock_page(page);
/*
@@ -470,7 +451,6 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
}
unlock_page(page);
- mutex_unlock(&hugetlb_fault_mutex_table[hash]);
}
huge_pagevec_release(&pvec);
cond_resched();
@@ -624,7 +604,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
/* addr is the offset within the file (zero based) */
addr = index * hpage_size;
- /* mutex taken here, fault path and hole punch */
+ /*
+ * fault mutex taken here, protects against fault path
+ * and hole punch. inode_lock previously taken protects
+ * against truncation.
+ */
hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping,
index, addr);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 362601b69c56..89e1a253a40b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3760,16 +3760,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
}
/*
- * Use page lock to guard against racing truncation
- * before we get page_table_lock.
+ * We can not race with truncation due to holding i_mmap_rwsem.
+ * Check once here for faults beyond end of file.
*/
+ size = i_size_read(mapping->host) >> huge_page_shift(h);
+ if (idx >= size)
+ goto out;
+
retry:
page = find_lock_page(mapping, idx);
if (!page) {
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto out;
-
/*
* Check for page in userfault range
*/
@@ -3859,9 +3859,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
}
ptl = huge_pte_lock(h, mm, ptep);
- size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
- goto backout;
ret = 0;
if (!huge_pte_none(huge_ptep_get(ptep)))
@@ -3964,8 +3961,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/*
* Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
- * until finished with ptep. This prevents huge_pmd_unshare from
- * being called elsewhere and making the ptep no longer valid.
+ * until finished with ptep. This serves two purposes:
+ * 1) It prevents huge_pmd_unshare from being called elsewhere
+ * and making the ptep no longer valid.
+ * 2) It synchronizes us with file truncation.
*
* ptep could have already be assigned via huge_pte_offset. That
* is OK, as huge_pte_alloc will return the same value unless
--
2.17.2
This is the start of the stable review cycle for the 4.9.146 release.
There are 51 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Dec 16 11:56:52 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.146-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.146-rc1
Guenter Roeck <linux(a)roeck-us.net>
staging: speakup: Replace strncpy with memcpy
Namhyung Kim <namhyung(a)kernel.org>
pstore: Convert console write to use ->write_buf
Pan Bian <bianpan2016(a)163.com>
ocfs2: fix potential use after free
Qian Cai <cai(a)gmx.us>
debugobjects: avoid recursive calls with kmemleak
Pan Bian <bianpan2016(a)163.com>
hfsplus: do not free node before using
Pan Bian <bianpan2016(a)163.com>
hfs: do not free node before using
Larry Chen <lchen(a)suse.com>
ocfs2: fix deadlock caused by ocfs2_defrag_extent()
Colin Ian King <colin.king(a)canonical.com>
fscache, cachefiles: remove redundant variable 'cache'
NeilBrown <neilb(a)suse.com>
fscache: fix race between enablement and dropping of object
Srikanth Boddepalli <boddepalli.srikanth(a)gmail.com>
xen: xlate_mmu: add missing header to fix 'W=1' warning
Y.C. Chen <yc_chen(a)aspeedtech.com>
drm/ast: fixed reading monitor EDID not stable issue
Pan Bian <bianpan2016(a)163.com>
net: hisilicon: remove unexpected free_netdev
Josh Elsasser <jelsasser(a)appneta.com>
ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
Yunjian Wang <wangyunjian(a)huawei.com>
igb: fix uninitialized variables
Kiran Kumar Modukuri <kiran.modukuri(a)gmail.com>
cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active
Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
net: thunderx: fix NULL pointer dereference in nic_remove
Yi Wang <wang.yi59(a)zte.com.cn>
x86/kvm/vmx: fix old-style function declaration
Yi Wang <wang.yi59(a)zte.com.cn>
KVM: x86: fix empty-body warnings
Aaro Koskinen <aaro.koskinen(a)iki.fi>
USB: omap_udc: fix USB gadget functionality on Palm Tungsten E
Aaro Koskinen <aaro.koskinen(a)iki.fi>
USB: omap_udc: fix omap_udc_start() on 15xx machines
Aaro Koskinen <aaro.koskinen(a)iki.fi>
USB: omap_udc: fix crashes on probe error and module removal
Aaro Koskinen <aaro.koskinen(a)iki.fi>
USB: omap_udc: use devm_request_irq()
Xin Long <lucien.xin(a)gmail.com>
ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notf
Martynas Pumputis <m(a)lambda.lt>
bpf: fix check of allowed specifiers in bpf_trace_printk
Pan Bian <bianpan2016(a)163.com>
exportfs: do not read dentry after free
Peter Ujfalusi <peter.ujfalusi(a)ti.com>
ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE
Peter Ujfalusi <peter.ujfalusi(a)ti.com>
ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE
Majd Dibbiny <majd(a)mellanox.com>
RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR
Robbie Ko <robbieko(a)synology.com>
Btrfs: send, fix infinite loop due to directory rename dependencies
Artem Savkov <asavkov(a)redhat.com>
objtool: Fix segfault in .cold detection with -ffunction-sections
Artem Savkov <asavkov(a)redhat.com>
objtool: Fix double-free in .cold detection error path
Huacai Chen <chenhc(a)lemote.com>
hwmon: (w83795) temp4_type has writable permission
Tzung-Bi Shih <tzungbi(a)google.com>
ASoC: dapm: Recalculate audio map forcely when card instantiated
Peter Ujfalusi <peter.ujfalusi(a)ti.com>
ASoC: omap-abe-twl6040: Fix missing audio card caused by deferred probing
Nicolin Chen <nicoleotsuka(a)gmail.com>
hwmon: (ina2xx) Fix current value calculation
Thomas Richter <tmricht(a)linux.ibm.com>
s390/cpum_cf: Reject request for sampling in event initialization
Florian Westphal <fw(a)strlen.de>
selftests: add script to stress-test nft packet path vs. control plane
YueHaibing <yuehaibing(a)huawei.com>
sysv: return 'err' instead of 0 in __sysv_write_inode
Janusz Krzysztofik <jmkrzyszt(a)gmail.com>
ARM: OMAP1: ams-delta: Fix possible use of uninitialized field
Adam Ford <aford173(a)gmail.com>
ARM: dts: logicpd-somlv: Fix interrupt on mmc3_dat1
Nathan Chancellor <natechancellor(a)gmail.com>
ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup
Stefano Brivio <sbrivio(a)redhat.com>
neighbour: Avoid writing before skb->head in neigh_hh_output()
Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
tun: forbid iface creation with rtnl ops
Yuchung Cheng <ycheng(a)google.com>
tcp: fix NULL ref in tail loss probe
Eric Dumazet <edumazet(a)google.com>
rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices
Christoph Paasch <cpaasch(a)apple.com>
net: Prevent invalid access to skb->prev in __qdisc_drop_all
Heiner Kallweit <hkallweit1(a)gmail.com>
net: phy: don't allow __set_phy_supported to add unsupported modes
Tarick Bedeir <tarick(a)google.com>
net/mlx4_core: Correctly set PFC param if global pause is turned off.
Su Yanjun <suyj.fnst(a)cn.fujitsu.com>
net: 8139cp: fix a BUG triggered by changing mtu with network traffic
Stefano Brivio <sbrivio(a)redhat.com>
ipv6: Check available headroom in ip6_xmit() even without options
Jiri Wiesner <jwiesner(a)suse.com>
ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/logicpd-som-lv.dtsi | 2 +-
arch/arm/mach-omap1/board-ams-delta.c | 3 +
arch/arm/mach-omap2/prm44xx.c | 2 +-
arch/s390/kernel/perf_cpum_cf.c | 2 +
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/vmx.c | 8 +-
drivers/gpu/drm/ast/ast_mode.c | 36 +++++++--
drivers/hwmon/ina2xx.c | 2 +-
drivers/hwmon/w83795.c | 2 +-
drivers/infiniband/hw/mlx5/qp.c | 19 ++---
drivers/net/ethernet/cavium/thunder/nic_main.c | 3 +
drivers/net/ethernet/hisilicon/hip04_eth.c | 4 +-
drivers/net/ethernet/intel/igb/e1000_i210.c | 1 +
drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 +-
drivers/net/ethernet/realtek/8139cp.c | 5 ++
drivers/net/phy/phy_device.c | 19 ++---
drivers/net/tun.c | 6 +-
drivers/staging/speakup/kobjects.c | 4 +-
drivers/usb/gadget/udc/omap_udc.c | 87 ++++++++--------------
drivers/xen/xlate_mmu.c | 1 +
fs/btrfs/send.c | 11 ++-
fs/cachefiles/rdwr.c | 9 ++-
fs/exportfs/expfs.c | 2 +-
fs/fscache/object.c | 3 +
fs/hfs/btree.c | 3 +-
fs/hfsplus/btree.c | 3 +-
fs/ocfs2/export.c | 2 +-
fs/ocfs2/move_extents.c | 47 ++++++------
fs/pstore/platform.c | 4 +-
fs/sysv/inode.c | 2 +-
include/net/neighbour.h | 28 +++++--
kernel/trace/bpf_trace.c | 8 +-
lib/debugobjects.c | 3 +-
net/core/rtnetlink.c | 3 +
net/ipv4/ip_fragment.c | 7 ++
net/ipv4/tcp_output.c | 12 ++-
net/ipv6/ip6_output.c | 42 +++++------
net/ipv6/netfilter/nf_conntrack_reasm.c | 8 +-
net/ipv6/reassembly.c | 8 +-
net/netfilter/ipvs/ip_vs_ctl.c | 3 +
net/sched/sch_netem.c | 3 +
sound/soc/omap/omap-abe-twl6040.c | 67 ++++++++---------
sound/soc/omap/omap-dmic.c | 9 +++
sound/soc/omap/omap-mcpdm.c | 43 ++++++++++-
sound/soc/soc-core.c | 1 +
tools/objtool/elf.c | 19 ++++-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/netfilter/Makefile | 6 ++
tools/testing/selftests/netfilter/config | 2 +
.../selftests/netfilter/nft_trans_stress.sh | 78 +++++++++++++++++++
52 files changed, 439 insertions(+), 218 deletions(-)
Mr Gleixner,
I was upset when I compiled 4.14.87 found that SCHED_SMT had been
forced on. At the time, I just reported it to my blog, and posted a
question about it to a couple of forums, and stayed with an earlier
kernel.
However, when 4.14.88 came out, and still the same situation, alarm
bells went off, and I looked through the kernel changelog. Found it,
4.14.86:
"x86/Kconfig: Select SCHED_SMT if SMP enabled"
Then:
"CONFIG_SCHED_SMT is enabled by all distros, so there is not a real point to
have it configurable. ..."
...that is a lie. It would be correct to state that is true of the
distros you use, and presumably also for all of you guys who signed
off on it.
Puppy Linux is an example of a distro that has mostly not had
SCHED_SMT enabled. Ditto for most of the forks of Puppy. Two distros
that I currently maintain, Quirky and EasyOS (easyos.org) have SMP
enabled but not SCHED_SMT.
The difference between them is important, they should remain
independently settable. I am so surprised that all of you guys went
along with forcing it on.
For the record, my blog post:
http://bkhome.org/news/201812/kernel-41487-compiled.html
Regards,
Barry Kauler
Now MTD emulated by UBI volumn doesn't allocate wbuf_verify in
jffs2_ubivol_setup(), because UBI can do the verifcation itself,
so when CONFIG_JFFS2_FS_WBUF_VERIFY is enabled and a MTD device
emulated by UBI volumn is used, a Oops will occur as show in the
following trace:
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 6 PID: 404 Comm: kworker/6:1 Not tainted 4.19.0-rc8
Workqueue: events_long delayed_wbuf_sync
RIP: 0010:ubi_io_read+0x156/0x650
Call Trace:
ubi_eba_read_leb+0x57d/0xba0
ubi_leb_read+0xe5/0x1b0
gluebi_read+0x10c/0x1a0
mtd_read+0x112/0x340
jffs2_verify_write+0xef/0x440
__jffs2_flush_wbuf+0x3fa/0x3540
jffs2_flush_wbuf_gc+0x1b1/0x2e0
process_one_work+0x58b/0x11e0
worker_thread+0x8f/0xfe0
kthread+0x2ae/0x3a0
ret_from_fork+0x35/0x40
Fix the problem by checking the validity of wbuf_verify before
using it in jffs2_verify_write().
Cc: stable(a)vger.kernel.org
Fixes: 0029da3bf430 ("JFFS2: add UBI support")
Signed-off-by: Hou Tao <houtao1(a)huawei.com>
---
fs/jffs2/wbuf.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/jffs2/wbuf.c b/fs/jffs2/wbuf.c
index c6821a509481..3de45f4559d1 100644
--- a/fs/jffs2/wbuf.c
+++ b/fs/jffs2/wbuf.c
@@ -234,6 +234,13 @@ static int jffs2_verify_write(struct jffs2_sb_info *c, unsigned char *buf,
size_t retlen;
char *eccstr;
+ /*
+ * MTD emulated by UBI volume doesn't allocate wbuf_verify,
+ * because it can do the verification itself.
+ */
+ if (!c->wbuf_verify)
+ return 0;
+
ret = mtd_read(c->mtd, ofs, c->wbuf_pagesize, &retlen, c->wbuf_verify);
if (ret && ret != -EUCLEAN && ret != -EBADMSG) {
pr_warn("%s(): Read back of page at %08x failed: %d\n",
--
2.16.2.dirty
commit 1f82de10d6 ("PCI/x86: don't assume prefetchable ranges are
64bit") added probing of bridge support for 64 bit memory
each time bridge is re-enumerated.
Unfortunately this probing is destructive if any device behind
the bridge is in use at this time.
There's no real need to re-probe the bridge features as the
regiters in question never change - detect that using
the memory flag being set and skip the probing.
Avoiding repeated calls to pci_bridge_check_ranges might be even nicer
would be a bigger patch and probably not appropriate on stable.
Reported-by: xuyandong <xuyandong2(a)huawei.com>
Cc: stable(a)vger.kernel.org
Cc: Yinghai Lu <yinghai(a)kernel.org>
Cc: Jesse Barnes <jbarnes(a)virtuousgeek.org>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
---
This issue has been reported on upstream Linux and Centos.
drivers/pci/setup-bus.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ed960436df5e..7ab42f76579e 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -741,6 +741,13 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
struct resource *b_res;
b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
+
+ /* Don't re-check after this was called once already:
+ * important since bridge might be in use.
+ */
+ if (b_res[1].flags & IORESOURCE_MEM)
+ return;
+
b_res[1].flags |= IORESOURCE_MEM;
pci_read_config_word(bridge, PCI_IO_BASE, &io);
--
MST
According to my memo at hand and saved records, writing 0x00000001 to
SND_FF_REG_FETCH_PCM_FRAMES disables fetching PCM frames in corresponding
channel, however current implement uses reversed logic. This results in
muted volume in device side during playback.
This commit corrects the bug.
Cc: <stable(a)vger.kernel.org> # v4.12+
Fixes: 76fdb3a9e13a ('ALSA: fireface: add support for Fireface 400')
Signed-off-by: Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
---
sound/firewire/fireface/ff-protocol-ff400.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/firewire/fireface/ff-protocol-ff400.c b/sound/firewire/fireface/ff-protocol-ff400.c
index 8f34174ee813..fd257051d4a4 100644
--- a/sound/firewire/fireface/ff-protocol-ff400.c
+++ b/sound/firewire/fireface/ff-protocol-ff400.c
@@ -86,7 +86,7 @@ static int ff400_switch_fetching_mode(struct snd_ff *ff, bool enable)
if (reg == NULL)
return -ENOMEM;
- if (enable) {
+ if (!enable) {
/*
* Each quadlet is corresponding to data channels in a data
* blocks in reverse order. Precisely, quadlets for available
--
2.19.1
marvell_nfc_wait_op() waits for completion during 'timeout_ms'
milliseconds before throwing an error. While the logic is fine, the
value of 'timeout_ms' is given by the core and actually correspond to
the maximum time the NAND chip will take to complete the
operation. Assuming there is no overhead in the propagation of the
interrupt signal to the the NAND controller (through the Ready/Busy
line), this delay does not take into account the latency of the
operating system. For instance, for a page write, the delay given by
the core is rounded up to 1ms. Hence, when the machine is over loaded,
there is chances that this timeout will be reached.
There are two ways to solve this issue that are not incompatible:
1/ Enlarge the timeout value (if so, how much?).
2/ Check after the waiting method if we did not miss any interrupt
because of the OS latency (an interrupt is still pending). In this
case, we assume the operation exited successfully.
We choose the second approach that is a must in all cases, with the
possibility to also modify the timeout value to be, e.g. at least 1
second in all cases.
Fixes: 02f26ecf8c77 ("mtd: nand: add reworked Marvell NAND controller driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/marvell_nand.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
index e6c3739cea73..bc0eef4ade4f 100644
--- a/drivers/mtd/nand/raw/marvell_nand.c
+++ b/drivers/mtd/nand/raw/marvell_nand.c
@@ -514,9 +514,14 @@ static void marvell_nfc_enable_int(struct marvell_nfc *nfc, u32 int_mask)
writel_relaxed(reg & ~int_mask, nfc->regs + NDCR);
}
-static void marvell_nfc_clear_int(struct marvell_nfc *nfc, u32 int_mask)
+static int marvell_nfc_clear_int(struct marvell_nfc *nfc, u32 int_mask)
{
+ u32 reg;
+
+ reg = readl_relaxed(nfc->regs + NDSR);
writel_relaxed(int_mask, nfc->regs + NDSR);
+
+ return reg & int_mask;
}
static void marvell_nfc_force_byte_access(struct nand_chip *chip,
@@ -683,6 +688,7 @@ static int marvell_nfc_wait_cmdd(struct nand_chip *chip)
static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
{
struct marvell_nfc *nfc = to_marvell_nfc(chip->controller);
+ int pending;
int ret;
/* Timeout is expressed in ms */
@@ -695,8 +701,13 @@ static int marvell_nfc_wait_op(struct nand_chip *chip, unsigned int timeout_ms)
ret = wait_for_completion_timeout(&nfc->complete,
msecs_to_jiffies(timeout_ms));
marvell_nfc_disable_int(nfc, NDCR_RDYM);
- marvell_nfc_clear_int(nfc, NDSR_RDY(0) | NDSR_RDY(1));
- if (!ret) {
+ pending = marvell_nfc_clear_int(nfc, NDSR_RDY(0) | NDSR_RDY(1));
+
+ /*
+ * In case the interrupt was not served in the required time frame,
+ * check if the ISR was not served or if something went actually wrong.
+ */
+ if (ret && !pending) {
dev_err(nfc->dev, "Timeout waiting for RB signal\n");
return -ETIMEDOUT;
}
--
2.19.1
Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
accidentally broke unwinding from userspace, because ld would strip the
.eh_frame sections when linking.
Originally, the compiler would implicitly add --eh-frame-hdr when
invoking the linker, but when this Makefile was converted from invoking
ld via the compiler, to invoking it directly (like vmlinux does),
the flag was missed. (The EH_FRAME section is important for the VDSO
shared libraries, but not for vmlinux.)
Fix the problem by explicitly specifying --eh-frame-hdr, which restores
parity with the old method.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201741
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1659295
Reported-by: Florian Weimer <fweimer(a)redhat.com>
Reported-by: Carlos O'Donell <carlos(a)redhat.com>
Reported-by: "H. J. Lu" <hjl.tools(a)gmail.com>
Tested-by: Laura Abbott <labbott(a)redhat.com>
Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
Cc: stable(a)vger.kernel.org
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: X86 ML <x86(a)kernel.org>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: kernel-team(a)android.com
Signed-off-by: Alistair Strachan <astrachan(a)google.com>
---
v2: Updated commit message, no changes to the code
arch/x86/entry/vdso/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 141d415a8c80..c3d7ccd25381 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -171,7 +171,8 @@ quiet_cmd_vdso = VDSO $@
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
- $(call ld-option, --build-id) -Bsymbolic
+ $(call ld-option, --build-id) $(call ld-option, --eh-frame-hdr) \
+ -Bsymbolic
GCOV_PROFILE := n
#
--
2.20.0.405.gbc1bbc6f85-goog
Commit e1e6255c311b ("mtd: rawnand: omap2: convert driver to
nand_scan()") moved part of the init code in the ->attach_chip hook
and at the same time changed the struct device object passed to
dma_request_chan() (&pdev->dev instead of pdev->dev.parent).
Fixes: e1e6255c311b ("mtd: rawnand: omap2: convert driver to nand_scan()")
Reported-by: Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon(a)bootlin.com>
---
Changes in v2:
- Fix the prefix
---
drivers/mtd/nand/raw/omap2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/omap2.c b/drivers/mtd/nand/raw/omap2.c
index 886d05c391ef..68e8b9f7f372 100644
--- a/drivers/mtd/nand/raw/omap2.c
+++ b/drivers/mtd/nand/raw/omap2.c
@@ -1944,7 +1944,7 @@ static int omap_nand_attach_chip(struct nand_chip *chip)
case NAND_OMAP_PREFETCH_DMA:
dma_cap_zero(mask);
dma_cap_set(DMA_SLAVE, mask);
- info->dma = dma_request_chan(dev, "rxtx");
+ info->dma = dma_request_chan(dev->parent, "rxtx");
if (IS_ERR(info->dma)) {
dev_err(dev, "DMA engine request failed\n");
--
2.17.1
In IEC 61883-1/6 engine of ALSA firewire stack, a packet handler has a
second argument for 'the number of bytes in payload of isochronous
packet'. However, an incoming packet handler without CIP header uses the
value as 'the number of quadlets in the payload'. This brings userspace
applications to receive the number of PCM frames as four times against
real time.
This commit fixes the bug.
Cc: <stable(a)vger.kernel.org> # v4.12+
Fixes: 3b196c394dd ('ALSA: firewire-lib: add no-header packet processing')
Signed-off-by: Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
---
sound/firewire/amdtp-stream.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index 9be76c808fcc..3ada55ed5381 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -654,15 +654,17 @@ static int handle_in_packet(struct amdtp_stream *s,
}
static int handle_in_packet_without_header(struct amdtp_stream *s,
- unsigned int payload_quadlets, unsigned int cycle,
+ unsigned int payload_length, unsigned int cycle,
unsigned int index)
{
__be32 *buffer;
+ unsigned int payload_quadlets;
unsigned int data_blocks;
struct snd_pcm_substream *pcm;
unsigned int pcm_frames;
buffer = s->buffer.packets[s->packet_index].buffer;
+ payload_quadlets = payload_length / 4;
data_blocks = payload_quadlets / s->data_block_quadlets;
trace_in_packet_without_header(s, cycle, payload_quadlets, data_blocks,
--
2.19.1
On Fri, Dec 14, 2018 at 11:42:01PM +0100, Thomas Schöbel-Theuer wrote:
> Ah, I overlooked that commit e56c92565dfe2 is already providing a different
> solution to the same problem in newer kernels _only_, as a _side_ effect
> (not clear to me from the description, but clear from reading the code).
Damn, I missed the fact that this is not the upstream kernel:
CPU: 0 PID: 1 UID: 0 Comm: swapper/0 Not tainted 4.4.0-ui18344.004-uiabi1-infong-amd64 #1
> So another alternative would be backporting e56c92565dfe2 to the 4.4 LTS
> series. Also fine for me.
That looks like the right fix.
A note for the next time: do not send a fix for a stable kernel which is
not upstream:
>From Documentation/process/stable-kernel-rules.rst:
" - It or an equivalent fix must already exist in Linus' tree (upstream)."
The stable kernels track upstream so if a stable kernel has a problem,
the first thing one needs to do is to check whether this has been fixed
upstream and if so, to backport it. This is the case most of the time.
In the very seldom cases where a separate fix is needed, it needs to be
handled by asking Greg what to do. :-)
Adding stable@ folks to CC to set me straight if I'm missing something.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 12/15/18 2:10 AM, Michael Lyle wrote:
> Coly--
>
> Apologies for the late reply on this. I just noticed it based on Greg's
> comment about stable.
>
> When I wrote the previous "accelerate writeback" patchset, my first
> attempt was very much like this. I believe it was asked (by you?)
> whether it would impact the latency of front-end I/O because of deep
> backing device queues when a new request comes in.
>
> Won't this cause lots of requests to be pending to backing, so if there
> is intermittent front-end I/O they'll have to wait for the device?
> That's why I previous had it set to only complete one writeback at a
> time, to bound the impact on latency-- based on that review feedback.
Hi Mike,
This patch is a much more conservative effort. It sets a high writeback
rate only when all attached bcache device are idled for quite many
seconds. In this situation, the cache set is really quite and spared.
Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
just looks at single bcache device. If there are I/Os for other bcache
device on the cache set, and a single bcache device is idle, a faster
writeback rate for this single idle bcache device will happen, I/O to
read dirty data on cache for writeback will have negative impact to I/O
requests of other bcache devices.
Therefore I give up a specific faster writeback, to make sure the
latency of front end I/O in general.
Thanks.
Coly Li
> On Mon, Jul 23, 2018 at 9:03 PM Coly Li <colyli(a)suse.de
> <mailto:colyli@suse.de>> wrote:
>
> Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
> allows the writeback rate to be faster if there is no I/O request on a
> bcache device. It works well if there is only one bcache device attached
> to the cache set. If there are many bcache devices attached to a cache
> set, it may introduce performance regression because multiple faster
> writeback threads of the idle bcache devices will compete the btree
> level
> locks with the bcache device who have I/O requests coming.
>
> This patch fixes the above issue by only permitting fast writebac when
> all bcache devices attached on the cache set are idle. And if one of the
> bcache devices has new I/O request coming, minimized all writeback
> throughput immediately and let PI controller __update_writeback_rate()
> to decide the upcoming writeback rate for each bcache device.
>
> Also when all bcache devices are idle, limited wrieback rate to a small
> number is wast of thoughput, especially when backing devices are slower
> non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
> rate for each backing device if the whole cache set is idle. A faster
> writeback rate in idle time means new I/Os may have more available space
> for dirty data, and people may observe a better write performance then.
>
> Please note bcache may change its cache mode in run time, and this patch
> still works if the cache mode is switched from writeback mode and there
> is still dirty data on cache.
>
> Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when
> backing idle")
> Cc: stable(a)vger.kernel.org <mailto:stable@vger.kernel.org> #4.16+
> Signed-off-by: Coly Li <colyli(a)suse.de <mailto:colyli@suse.de>>
> Tested-by: Kai Krakow <kai(a)kaishome.de <mailto:kai@kaishome.de>>
> Cc: Michael Lyle <mlyle(a)lyle.org <mailto:mlyle@lyle.org>>
> Cc: Stefan Priebe <s.priebe(a)profihost.ag <mailto:s.priebe@profihost.ag>>
> ---
> Channgelog:
> v2, Fix a deadlock reported by Stefan Priebe.
> v1, Initial version.
>
> drivers/md/bcache/bcache.h | 11 ++--
> drivers/md/bcache/request.c | 51 ++++++++++++++-
> drivers/md/bcache/super.c | 1 +
> drivers/md/bcache/sysfs.c | 14 +++--
> drivers/md/bcache/util.c | 2 +-
> drivers/md/bcache/util.h | 2 +-
> drivers/md/bcache/writeback.c | 115 ++++++++++++++++++++++++++--------
> 7 files changed, 155 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index d6bf294f3907..469ab1a955e0 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -328,13 +328,6 @@ struct cached_dev {
> */
> atomic_t has_dirty;
>
> - /*
> - * Set to zero by things that touch the backing volume-- except
> - * writeback. Incremented by writeback. Used to determine
> when to
> - * accelerate idle writeback.
> - */
> - atomic_t backing_idle;
> -
> struct bch_ratelimit writeback_rate;
> struct delayed_work writeback_rate_update;
>
> @@ -514,6 +507,8 @@ struct cache_set {
> struct cache_accounting accounting;
>
> unsigned long flags;
> + atomic_t idle_counter;
> + atomic_t at_max_writeback_rate;
>
> struct cache_sb sb;
>
> @@ -523,6 +518,8 @@ struct cache_set {
>
> struct bcache_device **devices;
> unsigned devices_max_used;
> + /* See set_at_max_writeback_rate() for how it is used */
> + unsigned previous_dirty_dc_nr;
> struct list_head cached_devs;
> uint64_t cached_dev_sectors;
> struct closure caching;
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index ae67f5fa8047..1af3d96abfa5 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -1104,6 +1104,43 @@ static void detached_dev_do_request(struct
> bcache_device *d, struct bio *bio)
>
> /* Cached devices - read & write stuff */
>
> +static void quit_max_writeback_rate(struct cache_set *c,
> + struct cached_dev *this_dc)
> +{
> + int i;
> + struct bcache_device *d;
> + struct cached_dev *dc;
> +
> + /*
> + * If bch_register_lock is acquired by other attach/detach
> operations,
> + * waiting here will increase I/O request latency for
> seconds or more.
> + * To avoid such situation, only writeback rate of current
> cached device
> + * is set to 1, and __update_write_back() will decide
> writeback rate
> + * of other cached devices (remember c->idle_counter is 0 now).
> + */
> + if (mutex_trylock(&bch_register_lock)){
> + for (i = 0; i < c->devices_max_used; i++) {
> + if (!c->devices[i])
> + continue;
> +
> + if (UUID_FLASH_ONLY(&c->uuids[i]))
> + continue;
> +
> + d = c->devices[i];
> + dc = container_of(d, struct cached_dev, disk);
> + /*
> + * set writeback rate to default minimum value,
> + * then let update_writeback_rate() to
> decide the
> + * upcoming rate.
> + */
> + atomic64_set(&dc->writeback_rate.rate, 1);
> + }
> +
> + mutex_unlock(&bch_register_lock);
> + } else
> + atomic64_set(&this_dc->writeback_rate.rate, 1);
> +}
> +
> static blk_qc_t cached_dev_make_request(struct request_queue *q,
> struct bio *bio)
> {
> @@ -1119,7 +1156,19 @@ static blk_qc_t
> cached_dev_make_request(struct request_queue *q,
> return BLK_QC_T_NONE;
> }
>
> - atomic_set(&dc->backing_idle, 0);
> + if (d->c) {
> + atomic_set(&d->c->idle_counter, 0);
> + /*
> + * If at_max_writeback_rate of cache set is true and
> new I/O
> + * comes, quit max writeback rate of all cached devices
> + * attached to this cache set, and set
> at_max_writeback_rate
> + * to false.
> + */
> + if
> (unlikely(atomic_read(&d->c->at_max_writeback_rate) == 1)) {
> + atomic_set(&d->c->at_max_writeback_rate, 0);
> + quit_max_writeback_rate(d->c, dc);
> + }
> + }
> generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0);
>
> bio_set_dev(bio, dc->bdev);
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index fa4058e43202..fa532d9f9353 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1687,6 +1687,7 @@ struct cache_set *bch_cache_set_alloc(struct
> cache_sb *sb)
> c->block_bits = ilog2(sb->block_size);
> c->nr_uuids = bucket_bytes(c) / sizeof(struct
> uuid_entry);
> c->devices_max_used = 0;
> + c->previous_dirty_dc_nr = 0;
> c->btree_pages = bucket_pages(c);
> if (c->btree_pages > BTREE_MAX_PAGES)
> c->btree_pages = max_t(int, c->btree_pages / 4,
> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
> index 225b15aa0340..d719021bff81 100644
> --- a/drivers/md/bcache/sysfs.c
> +++ b/drivers/md/bcache/sysfs.c
> @@ -170,7 +170,8 @@ SHOW(__bch_cached_dev)
> var_printf(writeback_running, "%i");
> var_print(writeback_delay);
> var_print(writeback_percent);
> - sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9);
> + sysfs_hprint(writeback_rate,
> + atomic64_read(&dc->writeback_rate.rate) << 9);
> sysfs_hprint(io_errors, atomic_read(&dc->io_errors));
> sysfs_printf(io_error_limit, "%i", dc->error_limit);
> sysfs_printf(io_disable, "%i", dc->io_disable);
> @@ -188,7 +189,8 @@ SHOW(__bch_cached_dev)
> char change[20];
> s64 next_io;
>
> - bch_hprint(rate, dc->writeback_rate.rate << 9);
> + bch_hprint(rate,
> + atomic64_read(&dc->writeback_rate.rate)
> << 9);
> bch_hprint(dirty,
> bcache_dev_sectors_dirty(&dc->disk) << 9);
> bch_hprint(target, dc->writeback_rate_target << 9);
>
> bch_hprint(proportional,dc->writeback_rate_proportional << 9);
> @@ -255,8 +257,12 @@ STORE(__cached_dev)
>
> sysfs_strtoul_clamp(writeback_percent,
> dc->writeback_percent, 0, 40);
>
> - sysfs_strtoul_clamp(writeback_rate,
> - dc->writeback_rate.rate, 1, INT_MAX);
> + if (attr == &sysfs_writeback_rate) {
> + int v;
> +
> + sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
> + atomic64_set(&dc->writeback_rate.rate, v);
> + }
>
> sysfs_strtoul_clamp(writeback_rate_update_seconds,
> dc->writeback_rate_update_seconds,
> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> index fc479b026d6d..84f90c3d996d 100644
> --- a/drivers/md/bcache/util.c
> +++ b/drivers/md/bcache/util.c
> @@ -200,7 +200,7 @@ uint64_t bch_next_delay(struct bch_ratelimit *d,
> uint64_t done)
> {
> uint64_t now = local_clock();
>
> - d->next += div_u64(done * NSEC_PER_SEC, d->rate);
> + d->next += div_u64(done * NSEC_PER_SEC,
> atomic64_read(&d->rate));
>
> /* Bound the time. Don't let us fall further than 2 seconds
> behind
> * (this prevents unnecessary backlog that would make it
> impossible
> diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
> index cced87f8eb27..7e17f32ab563 100644
> --- a/drivers/md/bcache/util.h
> +++ b/drivers/md/bcache/util.h
> @@ -442,7 +442,7 @@ struct bch_ratelimit {
> * Rate at which we want to do work, in units per second
> * The units here correspond to the units passed to
> bch_next_delay()
> */
> - uint32_t rate;
> + atomic64_t rate;
> };
>
> static inline void bch_ratelimit_reset(struct bch_ratelimit *d)
> diff --git a/drivers/md/bcache/writeback.c
> b/drivers/md/bcache/writeback.c
> index ad45ebe1a74b..11ffadc3cf8f 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -49,6 +49,80 @@ static uint64_t __calc_target_rate(struct
> cached_dev *dc)
> return (cache_dirty_target * bdev_share) >>
> WRITEBACK_SHARE_SHIFT;
> }
>
> +static bool set_at_max_writeback_rate(struct cache_set *c,
> + struct cached_dev *dc)
> +{
> + int i, dirty_dc_nr = 0;
> + struct bcache_device *d;
> +
> + /*
> + * bch_register_lock is acquired in
> cached_dev_detach_finish() before
> + * calling cancel_writeback_rate_update_dwork() to stop the
> delayed
> + * kworker writeback_rate_update (where the context we are
> for now).
> + * Therefore call mutex_lock() here may introduce deadlock
> when shut
> + * down the bcache device.
> + * c->previous_dirty_dc_nr is used to record previous calculated
> + * dirty_dc_nr when mutex_trylock() last time succeeded. Then if
> + * mutex_trylock() failed here, use c->previous_dirty_dc_nr
> as dirty
> + * cached device number. Of cause it might be inaccurate,
> but a few more
> + * or less loop before setting c->at_max_writeback_rate is
> much better
> + * then a deadlock here.
> + */
> + if (mutex_trylock(&bch_register_lock)) {
> + for (i = 0; i < c->devices_max_used; i++) {
> + if (!c->devices[i])
> + continue;
> + if (UUID_FLASH_ONLY(&c->uuids[i]))
> + continue;
> + d = c->devices[i];
> + dc = container_of(d, struct cached_dev, disk);
> + if (atomic_read(&dc->has_dirty))
> + dirty_dc_nr++;
> + }
> + c->previous_dirty_dc_nr = dirty_dc_nr;
> +
> + mutex_unlock(&bch_register_lock);
> + } else
> + dirty_dc_nr = c->previous_dirty_dc_nr;
> +
> + /*
> + * Idle_counter is increased everytime when
> update_writeback_rate()
> + * is rescheduled in. If all backing devices attached to the
> same
> + * cache set has same dc->writeback_rate_update_seconds
> value, it
> + * is about 10 rounds of update_writeback_rate() is called
> on each
> + * backing device, then the code will fall through at set 1 to
> + * c->at_max_writeback_rate, and a max wrteback rate to each
> + * dc->writeback_rate.rate. This is not very accurate but
> works well
> + * to make sure the whole cache set has no new I/O coming before
> + * writeback rate is set to a max number.
> + */
> + if (atomic_inc_return(&c->idle_counter) < dirty_dc_nr * 10)
> + return false;
> +
> + if (atomic_read(&c->at_max_writeback_rate) != 1)
> + atomic_set(&c->at_max_writeback_rate, 1);
> +
> +
> + atomic64_set(&dc->writeback_rate.rate, INT_MAX);
> +
> + /* keep writeback_rate_target as existing value */
> + dc->writeback_rate_proportional = 0;
> + dc->writeback_rate_integral_scaled = 0;
> + dc->writeback_rate_change = 0;
> +
> + /*
> + * Check c->idle_counter and c->at_max_writeback_rate
> agagain in case
> + * new I/O arrives during before set_at_max_writeback_rate()
> returns.
> + * Then the writeback rate is set to 1, and its new value
> should be
> + * decided via __update_writeback_rate().
> + */
> + if (atomic_read(&c->idle_counter) < dirty_dc_nr * 10 ||
> + !atomic_read(&c->at_max_writeback_rate))
> + return false;
> +
> + return true;
> +}
> +
> static void __update_writeback_rate(struct cached_dev *dc)
> {
> /*
> @@ -104,8 +178,9 @@ static void __update_writeback_rate(struct
> cached_dev *dc)
>
> dc->writeback_rate_proportional = proportional_scaled;
> dc->writeback_rate_integral_scaled = integral_scaled;
> - dc->writeback_rate_change = new_rate - dc->writeback_rate.rate;
> - dc->writeback_rate.rate = new_rate;
> + dc->writeback_rate_change = new_rate -
> + atomic64_read(&dc->writeback_rate.rate);
> + atomic64_set(&dc->writeback_rate.rate, new_rate);
> dc->writeback_rate_target = target;
> }
>
> @@ -138,9 +213,16 @@ static void update_writeback_rate(struct
> work_struct *work)
>
> down_read(&dc->writeback_lock);
>
> - if (atomic_read(&dc->has_dirty) &&
> - dc->writeback_percent)
> - __update_writeback_rate(dc);
> + if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
> + /*
> + * If the whole cache set is idle,
> set_at_max_writeback_rate()
> + * will set writeback rate to a max number. Then it is
> + * unncessary to update writeback rate for an idle
> cache set
> + * in maximum writeback rate number(s).
> + */
> + if (!set_at_max_writeback_rate(c, dc))
> + __update_writeback_rate(dc);
> + }
>
> up_read(&dc->writeback_lock);
>
> @@ -422,27 +504,6 @@ static void read_dirty(struct cached_dev *dc)
>
> delay = writeback_delay(dc, size);
>
> - /* If the control system would wait for at least half a
> - * second, and there's been no reqs hitting the
> backing disk
> - * for awhile: use an alternate mode where we have
> at most
> - * one contiguous set of writebacks in flight at a
> time. If
> - * someone wants to do IO it will be quick, as it
> will only
> - * have to contend with one operation in flight, and
> we'll
> - * be round-tripping data to the backing disk as
> quickly as
> - * it can accept it.
> - */
> - if (delay >= HZ / 2) {
> - /* 3 means at least 1.5 seconds, up to 7.5 if we
> - * have slowed way down.
> - */
> - if (atomic_inc_return(&dc->backing_idle) >= 3) {
> - /* Wait for current I/Os to finish */
> - closure_sync(&cl);
> - /* And immediately launch a new set. */
> - delay = 0;
> - }
> - }
> -
> while (!kthread_should_stop() &&
> !test_bit(CACHE_SET_IO_DISABLE,
> &dc->disk.c->flags) &&
> delay) {
> @@ -715,7 +776,7 @@ void bch_cached_dev_writeback_init(struct
> cached_dev *dc)
> dc->writeback_running = true;
> dc->writeback_percent = 10;
> dc->writeback_delay = 30;
> - dc->writeback_rate.rate = 1024;
> + atomic64_set(&dc->writeback_rate.rate, 1024);
> dc->writeback_rate_minimum = 8;
>
> dc->writeback_rate_update_seconds =
> WRITEBACK_RATE_UPDATE_SECS_DEFAULT;
> --
> 2.17.1
>
Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
accidentally broke unwinding from userspace, because ld would strip the
.eh_frame sections when linking.
Originally, the compiler would implicitly add --eh-frame-hdr when
invoking the linker, but when this Makefile was converted from invoking
ld via the compiler, to invoking it directly (like vmlinux does),
the flag was missed. (The EH_FRAME section is important for the VDSO
shared libraries, but not for vmlinux.)
Fix the problem by explicitly specifying --eh-frame-hdr, which restores
parity with the old method.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201741
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1659295
Reported-by: Laura Abbott <labbott(a)redhat.com>
Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
Cc: stable(a)vger.kernel.org
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: X86 ML <x86(a)kernel.org>
Cc: Florian Weimer <fweimer(a)redhat.com>,
Cc: Carlos O'Donell <carlos(a)redhat.com>,
Cc: "H. J. Lu" <hjl.tools(a)gmail.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: kernel-team(a)android.com
Signed-off-by: Alistair Strachan <astrachan(a)google.com>
---
arch/x86/entry/vdso/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 141d415a8c80..c3d7ccd25381 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -171,7 +171,8 @@ quiet_cmd_vdso = VDSO $@
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
- $(call ld-option, --build-id) -Bsymbolic
+ $(call ld-option, --build-id) $(call ld-option, --eh-frame-hdr) \
+ -Bsymbolic
GCOV_PROFILE := n
#
--
2.20.0.405.gbc1bbc6f85-goog
From: Thierry Reding <treding(a)nvidia.com>
Subject: scripts/spdxcheck.py: always open files in binary mode
The spdxcheck script currently falls over when confronted with a binary
file (such as Documentation/logo.gif). To avoid that, always open files
in binary mode and decode line-by-line, ignoring encoding errors.
One tricky case is when piping data into the script and reading it from
standard input. By default, standard input will be opened in text mode,
so we need to reopen it in binary mode.
The breakage only happens with python3 and results in a
UnicodeDecodeError (according to Uwe).
Link: http://lkml.kernel.org/r/20181212131210.28024-1-thierry.reding@gmail.com
Fixes: 6f4d29df66ac ("scripts/spdxcheck.py: make python3 compliant")
Signed-off-by: Thierry Reding <treding(a)nvidia.com>
Reviewed-by: Jeremy Cline <jcline(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Joe Perches <joe(a)perches.com>
Cc: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/spdxcheck.py | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/scripts/spdxcheck.py~scripts-spdxcheckpy-always-open-files-in-binary-mode
+++ a/scripts/spdxcheck.py
@@ -168,6 +168,7 @@ class id_parser(object):
self.curline = 0
try:
for line in fd:
+ line = line.decode(locale.getpreferredencoding(False), errors='ignore')
self.curline += 1
if self.curline > maxlines:
break
@@ -249,12 +250,13 @@ if __name__ == '__main__':
try:
if len(args.path) and args.path[0] == '-':
- parser.parse_lines(sys.stdin, args.maxlines, '-')
+ stdin = os.fdopen(sys.stdin.fileno(), 'rb')
+ parser.parse_lines(stdin, args.maxlines, '-')
else:
if args.path:
for p in args.path:
if os.path.isfile(p):
- parser.parse_lines(open(p), args.maxlines, p)
+ parser.parse_lines(open(p, 'rb'), args.maxlines, p)
elif os.path.isdir(p):
scan_git_subtree(repo.head.reference.commit.tree, p)
else:
_
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
could trigger an harmless false positive WARN_ON. Check the vma is
already registered before checking VM_MAYWRITE to shut off the false
positive warning.
Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com
Cc: <stable(a)vger.kernel.org>
Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: syzbot+06c7092e7d71218a2c16(a)syzkaller.appspotmail.com
Acked-by: Mike Rapoport <rppt(a)linux.ibm.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/userfaultfd.c~userfaultfd-check-vm_maywrite-was-set-after-verifying-the-uffd-is-registered
+++ a/fs/userfaultfd.c
@@ -1566,7 +1566,6 @@ static int userfaultfd_unregister(struct
cond_resched();
BUG_ON(!vma_can_userfault(vma));
- WARN_ON(!(vma->vm_flags & VM_MAYWRITE));
/*
* Nothing to do: this vma is already registered into this
@@ -1575,6 +1574,8 @@ static int userfaultfd_unregister(struct
if (!vma->vm_userfaultfd_ctx.ctx)
goto skip;
+ WARN_ON(!(vma->vm_flags & VM_MAYWRITE));
+
if (vma->vm_start > start)
start = vma->vm_start;
vma_end = min(end, vma->vm_end);
_
From: Piotr Jaroszynski <pjaroszynski(a)nvidia.com>
Subject: fs/iomap.c: get/put the page in iomap_page_create/release()
migrate_page_move_mapping() expects pages with private data set to have a
page_count elevated by 1. This is what used to happen for xfs through the
buffer_heads code before the switch to iomap in commit 82cb14175e7d ("xfs:
add support for sub-pagesize writeback without buffer_heads"). Not having
the count elevated causes move_pages() to fail on memory mapped files
coming from xfs.
Make iomap compatible with the migrate_page_move_mapping() assumption by
elevating the page count as part of iomap_page_create() and lowering it in
iomap_page_release().
It causes the move_pages() syscall to misbehave on memory mapped files
from xfs. It does not not move any pages, which I suppose is "just" a
perf issue, but it also ends up returning a positive number which is
out of spec for the syscall. Talking to Michal Hocko, it sounds like
returning positive numbers might be a necessary update to move_pages()
anyway though
(https://lkml.kernel.org/r/20181116114955.GJ14706@dhcp22.suse.cz).
I only hit this in tests that verify that move_pages() actually moved
the pages. The test also got confused by the positive return from
move_pages() (it got treated as a success as positive numbers were not
expected and not handled) making it a bit harder to track down what's
going on.
Link: http://lkml.kernel.org/r/20181115184140.1388751-1-pjaroszynski@nvidia.com
Fixes: 82cb14175e7d ("xfs: add support for sub-pagesize writeback without buffer_heads")
Signed-off-by: Piotr Jaroszynski <pjaroszynski(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Darrick J. Wong <darrick.wong(a)oracle.com>
Cc: Brian Foster <bfoster(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/iomap.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/fs/iomap.c~iomap-get-put-the-page-in-iomap_page_create-release
+++ a/fs/iomap.c
@@ -116,6 +116,12 @@ iomap_page_create(struct inode *inode, s
atomic_set(&iop->read_count, 0);
atomic_set(&iop->write_count, 0);
bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+
+ /*
+ * migrate_page_move_mapping() assumes that pages with private data have
+ * their count elevated by 1.
+ */
+ get_page(page);
set_page_private(page, (unsigned long)iop);
SetPagePrivate(page);
return iop;
@@ -132,6 +138,7 @@ iomap_page_release(struct page *page)
WARN_ON_ONCE(atomic_read(&iop->write_count));
ClearPagePrivate(page);
set_page_private(page, 0);
+ put_page(page);
kfree(iop);
}
_
The nr_dentry_unused per-cpu counter tracks dentries in both the
LRU lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused. This
is incorrect as the nr_dentry_unused count Will also be decremented in
shrink_dentry_list() via d_shrink_del(). To fix this double decrement,
the decrement in the shrink_dcache_sb() function is taken out.
Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable(a)vger.kernel.org
Signed-off-by: Waiman Long <longman(a)redhat.com>
Reviewed-by: Dave Chinner <dchinner(a)redhat.com>
---
fs/dcache.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index 2593153..44e5652 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1188,15 +1188,11 @@ static enum lru_status dentry_lru_isolate_shrink(struct list_head *item,
*/
void shrink_dcache_sb(struct super_block *sb)
{
- long freed;
-
do {
LIST_HEAD(dispose);
- freed = list_lru_walk(&sb->s_dentry_lru,
+ list_lru_walk(&sb->s_dentry_lru,
dentry_lru_isolate_shrink, &dispose, 1024);
-
- this_cpu_sub(nr_dentry_unused, freed);
shrink_dentry_list(&dispose);
} while (list_lru_count(&sb->s_dentry_lru) > 0);
}
--
1.8.3.1
From: Shuah Khan <shuah(a)kernel.org>
Commit b2d35fa5fc80 ("selftests: add headers_install to lib.mk") added
khdr target to run headers_install target from the main Makefile. The
logic uses KSFT_KHDR_INSTALL and top_srcdir as controls to initialize
variables and include files to run headers_install from the top level
Makefile. There are a few problems with this logic.
1. Exposes top_srcdir to all tests
2. Common logic impacts all tests
3. Uses KSFT_KHDR_INSTALL, top_srcdir, and khdr in an adhoc way. Tests
add "khdr" dependency in their Makefiles to TEST_PROGS_EXTENDED in
some cases, and STATIC_LIBS in other cases. This makes this framework
confusing to use.
The common logic that runs for all tests even when KSFT_KHDR_INSTALL
isn't defined by the test. top_srcdir is initialized to a default value
when test doesn't initialize it. It works for all tests without a sub-dir
structure and tests with sub-dir structure fail to build.
e.g: make -C sparc64/drivers/ or make -C drivers/dma-buf
../../lib.mk:20: ../../../../scripts/subarch.include: No such file or directory
make: *** No rule to make target '../../../../scripts/subarch.include'. Stop.
There is no reason to require all tests to define top_srcdir and there is
no need to require tests to add khdr dependency using adhoc changes to
TEST_* and other variables.
Fix it with a consistent use of KSFT_KHDR_INSTALL and top_srcdir from tests
that have the dependency on headers_install.
Change common logic to include khdr target define and "all" target with
dependency on khdr when KSFT_KHDR_INSTALL is defined.
Only tests that have dependency on headers_install have to define just
the KSFT_KHDR_INSTALL, and top_srcdir variables and there is no need to
specify khdr dependency in the test Makefiles.
Fixes: b2d35fa5fc80 ("selftests: add headers_install to lib.mk")
Cc: stable(a)vger.kernel.org
Signed-off-by: Shuah Khan <shuah(a)kernel.org>
---
tools/testing/selftests/android/Makefile | 2 +-
tools/testing/selftests/futex/functional/Makefile | 1 +
tools/testing/selftests/gpio/Makefile | 6 +++---
tools/testing/selftests/kvm/Makefile | 2 +-
tools/testing/selftests/lib.mk | 8 ++++----
tools/testing/selftests/networking/timestamping/Makefile | 1 +
tools/testing/selftests/tc-testing/bpf/Makefile | 1 +
tools/testing/selftests/vm/Makefile | 1 +
8 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/android/Makefile b/tools/testing/selftests/android/Makefile
index d9a725478375..72c25a3cb658 100644
--- a/tools/testing/selftests/android/Makefile
+++ b/tools/testing/selftests/android/Makefile
@@ -6,7 +6,7 @@ TEST_PROGS := run.sh
include ../lib.mk
-all: khdr
+all:
@for DIR in $(SUBDIRS); do \
BUILD_TARGET=$(OUTPUT)/$$DIR; \
mkdir $$BUILD_TARGET -p; \
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index ad1eeb14fda7..30996306cabc 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -19,6 +19,7 @@ TEST_GEN_FILES := \
TEST_PROGS := run.sh
top_srcdir = ../../../../..
+KSFT_KHDR_INSTALL := 1
include ../../lib.mk
$(TEST_GEN_FILES): $(HEADERS)
diff --git a/tools/testing/selftests/gpio/Makefile b/tools/testing/selftests/gpio/Makefile
index 46648427d537..07f572a1bd3f 100644
--- a/tools/testing/selftests/gpio/Makefile
+++ b/tools/testing/selftests/gpio/Makefile
@@ -10,8 +10,6 @@ TEST_PROGS_EXTENDED := gpio-mockup-chardev
GPIODIR := $(realpath ../../../gpio)
GPIOOBJ := gpio-utils.o
-include ../lib.mk
-
all: $(TEST_PROGS_EXTENDED)
override define CLEAN
@@ -19,7 +17,9 @@ override define CLEAN
$(MAKE) -C $(GPIODIR) OUTPUT=$(GPIODIR)/ clean
endef
-$(TEST_PROGS_EXTENDED):| khdr
+KSFT_KHDR_INSTALL := 1
+include ../lib.mk
+
$(TEST_PROGS_EXTENDED): $(GPIODIR)/$(GPIOOBJ)
$(GPIODIR)/$(GPIOOBJ):
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 01a219229238..52bfe5e76907 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -1,6 +1,7 @@
all:
top_srcdir = ../../../..
+KSFT_KHDR_INSTALL := 1
UNAME_M := $(shell uname -m)
LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/ucall.c lib/sparsebit.c
@@ -44,7 +45,6 @@ $(OUTPUT)/libkvm.a: $(LIBKVM_OBJ)
all: $(STATIC_LIBS)
$(TEST_GEN_PROGS): $(STATIC_LIBS)
-$(STATIC_LIBS):| khdr
cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
cscope:
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 0a8e75886224..8b0f16409ed7 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -16,18 +16,18 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED))
TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
+ifdef KSFT_KHDR_INSTALL
top_srcdir ?= ../../../..
include $(top_srcdir)/scripts/subarch.include
ARCH ?= $(SUBARCH)
-all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
-
.PHONY: khdr
khdr:
make ARCH=$(ARCH) -C $(top_srcdir) headers_install
-ifdef KSFT_KHDR_INSTALL
-$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES):| khdr
+all: khdr $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
+else
+all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
endif
.ONESHELL:
diff --git a/tools/testing/selftests/networking/timestamping/Makefile b/tools/testing/selftests/networking/timestamping/Makefile
index 14cfcf006936..c46c0eefab9e 100644
--- a/tools/testing/selftests/networking/timestamping/Makefile
+++ b/tools/testing/selftests/networking/timestamping/Makefile
@@ -6,6 +6,7 @@ TEST_PROGS := hwtstamp_config rxtimestamp timestamping txtimestamp
all: $(TEST_PROGS)
top_srcdir = ../../../../..
+KSFT_KHDR_INSTALL := 1
include ../../lib.mk
clean:
diff --git a/tools/testing/selftests/tc-testing/bpf/Makefile b/tools/testing/selftests/tc-testing/bpf/Makefile
index dc92eb271d9a..be5a5e542804 100644
--- a/tools/testing/selftests/tc-testing/bpf/Makefile
+++ b/tools/testing/selftests/tc-testing/bpf/Makefile
@@ -4,6 +4,7 @@ APIDIR := ../../../../include/uapi
TEST_GEN_FILES = action.o
top_srcdir = ../../../../..
+KSFT_KHDR_INSTALL := 1
include ../../lib.mk
CLANG ?= clang
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 6e67e726e5a5..e13eb6cc8901 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -25,6 +25,7 @@ TEST_GEN_FILES += virtual_address_range
TEST_PROGS := run_vmtests
+KSFT_KHDR_INSTALL := 1
include ../lib.mk
$(OUTPUT)/userfaultfd: LDLIBS += -lpthread
--
2.17.1
On Fri, Dec 14, 2018 at 06:29:54AM -0800, Eric Dumazet wrote:
> On Fri, Dec 14, 2018 at 6:26 AM Greg Kroah-Hartman <
> gregkh(a)linuxfoundation.org> wrote:
>
> > On Fri, Dec 14, 2018 at 02:03:55PM +0000, Sudip Mukherjee wrote:
> > > Hi Greg,
> > >
> > > On Fri, Dec 14, 2018 at 12:08 PM Greg Kroah-Hartman
> > > <gregkh(a)linuxfoundation.org> wrote:
> > > >
> > > > 4.14-stable review patch. If anyone has any objections, please let me
> > know.
> > > >
> > > > ------------------
> > > >
> > > > From: Eric Dumazet <edumazet(a)google.com>
> > > >
> > > > [ Upstream commit 41727549de3e7281feb174d568c6e46823db8684 ]
> > >
> > > There is one upstream commit which fixes this one.
> > > f9bfe4e6a9d0 ("tcp: lack of available data can also cause TSO defer")
> >
> > I can take this if Eric and/or David ack it :)
> >
> >
> I ack it, I guess David had this queued already.
>
> No big deal, that is only for tcp_info instrumentation accuracy.
If this is already queued for the next round of stable patches, and it's
not a major issue, I'll just wait and take it when it shows up in a week
or so.
thanks,
greg k-h
ASIDs have always been stored as unsigned longs, ie. 32 bits on MIPS32
kernels. This is problematic because it is feasible for the ASID version
to overflow & wrap around to zero.
We currently attempt to handle this overflow by simply setting the ASID
version to 1, using asid_first_version(), but we make no attempt to
account for the fact that there may be mm_structs with stale ASIDs that
have versions which we now reuse due to the overflow & wrap around.
Encountering this requires that:
1) A struct mm_struct X is active on CPU A using ASID (V,n).
2) That mm is not used on CPU A for the length of time that it takes
for CPU A's asid_cache to overflow & wrap around to the same
version V that the mm had in step 1. During this time tasks using
the mm could either be sleeping or only scheduled on other CPUs.
3) Some other mm Y becomes active on CPU A and is allocated the same
ASID (V,n).
4) mm X now becomes active on CPU A again, and now incorrectly has the
same ASID as mm Y.
Where struct mm_struct ASIDs are represented above in the format
(version, EntryHi.ASID), and on a typical MIPS32 system version will be
24 bits wide & EntryHi.ASID will be 8 bits wide.
The length of time required in step 2 is highly dependent upon the CPU &
workload, but for a hypothetical 2GHz CPU running a workload which
generates a new ASID every 10000 cycles this period is around 248 days.
Due to this long period of time & the fact that tasks need to be
scheduled in just the right (or wrong, depending upon your inclination)
way, this is obviously a difficult bug to encounter but it's entirely
possible as evidenced by reports.
In order to fix this, simply extend ASIDs to 64 bits even on MIPS32
builds. This will extend the period of time required for the
hypothetical system above to encounter the problem from 28 days to
around 3 trillion years, which feels safely outside of the realms of
possibility.
The cost of this is slightly more generated code in some commonly
executed paths, but this is pretty minimal:
| Code Size Gain | Percentage
-----------------------|----------------|-------------
decstation_defconfig | +270 | +0.00%
32r2el_defconfig | +652 | +0.01%
32r6el_defconfig | +1000 | +0.01%
I have been unable to measure any change in performance of the LMbench
lat_ctx or lat_proc tests resulting from the 64b ASIDs on either
32r2el_defconfig+interAptiv or 32r6el_defconfig+I6500 systems.
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Suggested-by: James Hogan <jhogan(a)kernel.org>
References: https://lore.kernel.org/linux-mips/80B78A8B8FEE6145A87579E8435D78C30205D5F3…
References: https://lore.kernel.org/linux-mips/1488684260-18867-1-git-send-email-jiwei.…
Cc: Jiwei Sun <jiwei.sun(a)windriver.com>
Cc: Yu Huabing <yhb(a)ruijie.com.cn>
Cc: stable(a)vger.kernel.org # 2.6.12+
---
arch/mips/include/asm/cpu-info.h | 2 +-
arch/mips/include/asm/mmu.h | 2 +-
arch/mips/include/asm/mmu_context.h | 8 ++++----
arch/mips/mm/c-r3k.c | 2 +-
4 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/mips/include/asm/cpu-info.h b/arch/mips/include/asm/cpu-info.h
index a41059d47d31..ed7ffe4e63a3 100644
--- a/arch/mips/include/asm/cpu-info.h
+++ b/arch/mips/include/asm/cpu-info.h
@@ -50,7 +50,7 @@ struct guest_info {
#define MIPS_CACHE_PINDEX 0x00000020 /* Physically indexed cache */
struct cpuinfo_mips {
- unsigned long asid_cache;
+ u64 asid_cache;
#ifdef CONFIG_MIPS_ASID_BITS_VARIABLE
unsigned long asid_mask;
#endif
diff --git a/arch/mips/include/asm/mmu.h b/arch/mips/include/asm/mmu.h
index 0740be7d5d4a..24d6b42345fb 100644
--- a/arch/mips/include/asm/mmu.h
+++ b/arch/mips/include/asm/mmu.h
@@ -7,7 +7,7 @@
#include <linux/wait.h>
typedef struct {
- unsigned long asid[NR_CPUS];
+ u64 asid[NR_CPUS];
void *vdso;
atomic_t fp_mode_switching;
diff --git a/arch/mips/include/asm/mmu_context.h b/arch/mips/include/asm/mmu_context.h
index 94414561de0e..fd869d538a3c 100644
--- a/arch/mips/include/asm/mmu_context.h
+++ b/arch/mips/include/asm/mmu_context.h
@@ -76,14 +76,14 @@ extern unsigned long pgd_current[];
* All unused by hardware upper bits will be considered
* as a software asid extension.
*/
-static unsigned long asid_version_mask(unsigned int cpu)
+static u64 asid_version_mask(unsigned int cpu)
{
unsigned long asid_mask = cpu_asid_mask(&cpu_data[cpu]);
- return ~(asid_mask | (asid_mask - 1));
+ return ~(u64)(asid_mask | (asid_mask - 1));
}
-static unsigned long asid_first_version(unsigned int cpu)
+static u64 asid_first_version(unsigned int cpu)
{
return ~asid_version_mask(cpu) + 1;
}
@@ -102,7 +102,7 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
static inline void
get_new_mmu_context(struct mm_struct *mm, unsigned long cpu)
{
- unsigned long asid = asid_cache(cpu);
+ u64 asid = asid_cache(cpu);
if (!((asid += cpu_asid_inc()) & cpu_asid_mask(&cpu_data[cpu]))) {
if (cpu_has_vtag_icache)
diff --git a/arch/mips/mm/c-r3k.c b/arch/mips/mm/c-r3k.c
index 3466fcdae0ca..01848cdf2074 100644
--- a/arch/mips/mm/c-r3k.c
+++ b/arch/mips/mm/c-r3k.c
@@ -245,7 +245,7 @@ static void r3k_flush_cache_page(struct vm_area_struct *vma,
pmd_t *pmdp;
pte_t *ptep;
- pr_debug("cpage[%08lx,%08lx]\n",
+ pr_debug("cpage[%08llx,%08lx]\n",
cpu_context(smp_processor_id(), mm), addr);
/* No ASID => no such page in the cache. */
--
2.19.1
The patch titled
Subject: proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
has been added to the -mm tree. Its filename is
proc-sysctl-dont-return-enomem-on-lookup-when-a-table-is-unregistering.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/proc-sysctl-dont-return-enomem-on-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/proc-sysctl-dont-return-enomem-on-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ivan Delalande <colona(a)arista.com>
Subject: proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
proc_sys_lookup can fail with ENOMEM instead of ENOENT when the
corresponding sysctl table is being unregistered. In our case we see this
upon opening /proc/sys/net/*/conf files while network interfaces are being
deleted, which confuses our configuration daemon.
The problem was successfully reproduced and this fix tested on v4.9.122
and v4.20-rc6.
Link: http://lkml.kernel.org/r/20181213232052.GA1513@visor
Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
Signed-off-by: Ivan Delalande <colona(a)arista.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/fs/proc/proc_sysctl.c~proc-sysctl-dont-return-enomem-on-lookup-when-a-table-is-unregistering
+++ a/fs/proc/proc_sysctl.c
@@ -464,7 +464,7 @@ static struct inode *proc_sys_make_inode
inode = new_inode(sb);
if (!inode)
- goto out;
+ return ERR_PTR(-ENOMEM);
inode->i_ino = get_next_ino();
@@ -474,7 +474,7 @@ static struct inode *proc_sys_make_inode
if (unlikely(head->unregistering)) {
spin_unlock(&sysctl_lock);
iput(inode);
- inode = NULL;
+ inode = ERR_PTR(-ENOENT);
goto out;
}
ei->sysctl = head;
@@ -549,10 +549,11 @@ static struct dentry *proc_sys_lookup(st
goto out;
}
- err = ERR_PTR(-ENOMEM);
inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
- if (!inode)
+ if (IS_ERR(inode)) {
+ err = ERR_CAST(inode);
goto out;
+ }
d_set_d_op(dentry, &proc_sys_dentry_operations);
err = d_splice_alias(inode, dentry);
@@ -685,7 +686,7 @@ static bool proc_sys_fill_cache(struct f
if (d_in_lookup(child)) {
struct dentry *res;
inode = proc_sys_make_inode(dir->d_sb, head, table);
- if (!inode) {
+ if (IS_ERR(inode)) {
d_lookup_done(child);
dput(child);
return false;
_
Patches currently in -mm which might be from colona(a)arista.com are
proc-sysctl-dont-return-enomem-on-lookup-when-a-table-is-unregistering.patch
The 'nr_pages' attribute of the 'msc' subdevices parses a comma-separated
list of window sizes, passed from userspace. However, there is a bug in
the string parsing logic wherein it doesn't exclude the comma character
from the range of characters as it consumes them. This leads to an
out-of-bounds access given a sufficiently long list. For example:
> # echo 8,8,8,8 > /sys/bus/intel_th/devices/0-msc0/nr_pages
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in memchr+0x1e/0x40
> Read of size 1 at addr ffff8803ffcebcd1 by task sh/825
>
> CPU: 3 PID: 825 Comm: npktest.sh Tainted: G W 4.20.0-rc1+
> Call Trace:
> dump_stack+0x7c/0xc0
> print_address_description+0x6c/0x23c
> ? memchr+0x1e/0x40
> kasan_report.cold.5+0x241/0x308
> memchr+0x1e/0x40
> nr_pages_store+0x203/0xd00 [intel_th_msu]
Fix this by accounting for the comma character.
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Fixes: ba82664c134ef ("intel_th: Add Memory Storage Unit driver")
Cc: stable(a)vger.kernel.org # v4.4+
---
drivers/hwtracing/intel_th/msu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/intel_th/msu.c b/drivers/hwtracing/intel_th/msu.c
index d293e55553bd..ba7aaf421f36 100644
--- a/drivers/hwtracing/intel_th/msu.c
+++ b/drivers/hwtracing/intel_th/msu.c
@@ -1423,7 +1423,8 @@ nr_pages_store(struct device *dev, struct device_attribute *attr,
if (!end)
break;
- len -= end - p;
+ /* consume the number and the following comma, hence +1 */
+ len -= end - p + 1;
p = end + 1;
} while (len);
--
2.19.2
If CONFIG_GPOILIB is not set, the stub of gpio_to_desc() should return
the same type of error as regular version: NULL. All the callers
compare the return value of gpio_to_desc() against NULL, so returned
ERR_PTR would be treated as non-error case leading to dereferencing of
error value.
Fixes: 79a9becda894 ("gpiolib: export descriptor-based GPIO interface")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
---
include/linux/gpio/consumer.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h
index ed070512b40e..3b01fbcafc94 100644
--- a/include/linux/gpio/consumer.h
+++ b/include/linux/gpio/consumer.h
@@ -505,7 +505,7 @@ static inline int gpiod_set_consumer_name(struct gpio_desc *desc,
static inline struct gpio_desc *gpio_to_desc(unsigned gpio)
{
- return ERR_PTR(-EINVAL);
+ return NULL;
}
static inline int desc_to_gpio(const struct gpio_desc *desc)
--
2.7.4
To change the active state of an MMIO, halt is requested for all vcpus of
the affected guest before modifying the IRQ state. This is done by calling
cond_resched_lock() in vgic_mmio_change_active(). However interrupts are
disabled at this point and we cannot reschedule a vcpu.
Solve this by waiting for all vcpus to be halted after emmiting the halt
request.
Signed-off-by: Julien Thierry <julien.thierry(a)arm.com>
Suggested-by: Marc Zyngier <marc.zyngier(a)arm.com>
Cc: Christoffer Dall <christoffer.dall(a)arm.com>
Cc: Marc Zyngier <marc.zyngier(a)arm.com>
Cc: stable(a)vger.kernel.org
---
virt/kvm/arm/vgic/vgic-mmio.c | 36 ++++++++++++++----------------------
1 file changed, 14 insertions(+), 22 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index f56ff1c..5c76a92 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -313,27 +313,6 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
spin_lock_irqsave(&irq->irq_lock, flags);
- /*
- * If this virtual IRQ was written into a list register, we
- * have to make sure the CPU that runs the VCPU thread has
- * synced back the LR state to the struct vgic_irq.
- *
- * As long as the conditions below are true, we know the VCPU thread
- * may be on its way back from the guest (we kicked the VCPU thread in
- * vgic_change_active_prepare) and still has to sync back this IRQ,
- * so we release and re-acquire the spin_lock to let the other thread
- * sync back the IRQ.
- *
- * When accessing VGIC state from user space, requester_vcpu is
- * NULL, which is fine, because we guarantee that no VCPUs are running
- * when accessing VGIC state from user space so irq->vcpu->cpu is
- * always -1.
- */
- while (irq->vcpu && /* IRQ may have state in an LR somewhere */
- irq->vcpu != requester_vcpu && /* Current thread is not the VCPU thread */
- irq->vcpu->cpu != -1) /* VCPU thread is running */
- cond_resched_lock(&irq->irq_lock);
-
if (irq->hw) {
vgic_hw_irq_change_active(vcpu, irq, active, !requester_vcpu);
} else {
@@ -368,8 +347,21 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
*/
static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
{
- if (intid > VGIC_NR_PRIVATE_IRQS)
+ if (intid > VGIC_NR_PRIVATE_IRQS) {
+ struct kvm_vcpu *tmp;
+ int i;
+
kvm_arm_halt_guest(vcpu->kvm);
+
+ /* Wait for each vcpu to be halted */
+ kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
+ if (tmp == vcpu)
+ continue;
+
+ while (tmp->cpu != -1)
+ cond_resched();
+ }
+ }
}
/* See vgic_change_active_prepare */
--
1.9.1
The code to prevent a bus suspend if a USB3 port was still in link training
also reacted to USB2 port polling state.
This caused bus suspend to busyloop in some cases.
USB2 polling state is different from USB3, and should not prevent bus
suspend.
Limit the USB3 link training state check to USB3 root hub ports only.
The origial commit went to stable so this need to be applied there as well
Fixes: 2f31a67f01a8 ("usb: xhci: Prevent bus suspend if a port connect change or polling state is detected")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-hub.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 94aca1b..01b5818 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1507,7 +1507,8 @@ int xhci_bus_suspend(struct usb_hcd *hcd)
portsc_buf[port_index] = 0;
/* Bail out if a USB3 port has a new device in link training */
- if ((t1 & PORT_PLS_MASK) == XDEV_POLLING) {
+ if ((hcd->speed >= HCD_USB3) &&
+ (t1 & PORT_PLS_MASK) == XDEV_POLLING) {
bus_state->bus_suspended = 0;
spin_unlock_irqrestore(&xhci->lock, flags);
xhci_dbg(xhci, "Bus suspend bailout, port in polling\n");
--
2.7.4
From: Michal Hocko <mhocko(a)suse.com>
Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the
ext4 writeback
task1:
[<ffffffff811aaa52>] wait_on_page_bit+0x82/0xa0
[<ffffffff811c5777>] shrink_page_list+0x907/0x960
[<ffffffff811c6027>] shrink_inactive_list+0x2c7/0x680
[<ffffffff811c6ba4>] shrink_node_memcg+0x404/0x830
[<ffffffff811c70a8>] shrink_node+0xd8/0x300
[<ffffffff811c73dd>] do_try_to_free_pages+0x10d/0x330
[<ffffffff811c7865>] try_to_free_mem_cgroup_pages+0xd5/0x1b0
[<ffffffff8122df2d>] try_charge+0x14d/0x720
[<ffffffff812320cc>] memcg_kmem_charge_memcg+0x3c/0xa0
[<ffffffff812321ae>] memcg_kmem_charge+0x7e/0xd0
[<ffffffff811b68a8>] __alloc_pages_nodemask+0x178/0x260
[<ffffffff8120bff5>] alloc_pages_current+0x95/0x140
[<ffffffff81074247>] pte_alloc_one+0x17/0x40
[<ffffffff811e34de>] __pte_alloc+0x1e/0x110
[<ffffffffa06739de>] alloc_set_pte+0x5fe/0xc20
[<ffffffff811e5d93>] do_fault+0x103/0x970
[<ffffffff811e6e5e>] handle_mm_fault+0x61e/0xd10
[<ffffffff8106ea02>] __do_page_fault+0x252/0x4d0
[<ffffffff8106ecb0>] do_page_fault+0x30/0x80
[<ffffffff8171bce8>] page_fault+0x28/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
task2:
[<ffffffff811aadc6>] __lock_page+0x86/0xa0
[<ffffffffa02f1e47>] mpage_prepare_extent_to_map+0x2e7/0x310 [ext4]
[<ffffffffa08a2689>] ext4_writepages+0x479/0xd60
[<ffffffff811bbede>] do_writepages+0x1e/0x30
[<ffffffff812725e5>] __writeback_single_inode+0x45/0x320
[<ffffffff81272de2>] writeback_sb_inodes+0x272/0x600
[<ffffffff81273202>] __writeback_inodes_wb+0x92/0xc0
[<ffffffff81273568>] wb_writeback+0x268/0x300
[<ffffffff81273d24>] wb_workfn+0xb4/0x390
[<ffffffff810a2f19>] process_one_work+0x189/0x420
[<ffffffff810a31fe>] worker_thread+0x4e/0x4b0
[<ffffffff810a9786>] kthread+0xe6/0x100
[<ffffffff8171a9a1>] ret_from_fork+0x41/0x50
[<ffffffffffffffff>] 0xffffffffffffffff
He adds
: task1 is waiting for the PageWriteback bit of the page that task2 has
: collected in mpd->io_submit->io_bio, and tasks2 is waiting for the LOCKED
: bit the page which tasks1 has locked.
More precisely task1 is handling a page fault and it has a page locked
while it charges a new page table to a memcg. That in turn hits a memory
limit reclaim and the memcg reclaim for legacy controller is waiting on
the writeback but that is never going to finish because the writeback
itself is waiting for the page locked in the #PF path. So this is
essentially ABBA deadlock:
lock_page(A)
SetPageWriteback(A)
unlock_page(A)
lock_page(B)
lock_page(B)
pte_alloc_pne
shrink_page_list
wait_on_page_writeback(A)
SetPageWriteback(B)
unlock_page(B)
# flush A, B to clear the writeback
This accumulating of more pages to flush is used by several filesystems
to generate a more optimal IO patterns.
Waiting for the writeback in legacy memcg controller is a workaround
for pre-mature OOM killer invocations because there is no dirty IO
throttling available for the controller. There is no easy way around
that unfortunately. Therefore fix this specific issue by pre-allocating
the page table outside of the page lock. We have that handy
infrastructure for that already so simply reuse the fault-around pattern
which already does this.
There are probably other hidden __GFP_ACCOUNT | GFP_KERNEL allocations
from under a fs page locked but they should be really rare. I am not
aware of a better solution unfortunately.
Reported-and-Debugged-by: Liu Bo <bo.liu(a)linux.alibaba.com>
Cc: stable
Fixes: c3b94f44fcb0 ("memcg: further prevent OOM with too many dirty pages")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
---
mm/memory.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index 4ad2d293ddc2..bb78e90a9b70 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2993,6 +2993,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
vm_fault_t ret;
+ /*
+ * Preallocate pte before we take page_lock because this might lead to
+ * deadlocks for memcg reclaim which waits for pages under writeback.
+ */
+ if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) {
+ vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm, vmf->address);
+ if (!vmf->prealloc_pte)
+ return VM_FAULT_OOM;
+ smp_wmb(); /* See comment in __pte_alloc() */
+ }
+
ret = vma->vm_ops->fault(vmf);
if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
VM_FAULT_DONE_COW)))
--
2.19.2
From: Henrik Austad <haustad(a)cisco.com>
Short story:
The following patches are needed on a 4.4 kernel to avoid
Oops in the scheduler when a sched_rr and sched_deadline task contends
on the same futex (with PI).
Longer story:
On one of our arm64 systems, we occasionally crash with an Oops in the
scheduler with the following backtrace.
[<ffffffc0000ee398>] enqueue_task_dl+0x1f0/0x420
[<ffffffc0000d0f14>] activate_task+0x7c/0x90
[<ffffffc0000edbdc>] push_dl_task+0x164/0x1c8
[<ffffffc0000edc60>] push_dl_tasks+0x20/0x30
[<ffffffc0000cc00c>] __balance_callback+0x44/0x68
[<ffffffc000d2c018>] __schedule+0x6f0/0x728
[<ffffffc000d2c278>] schedule+0x78/0x98
[<ffffffc000d2e76c>] __rt_mutex_slowlock+0x9c/0x108
[<ffffffc000d2e9d0>] rt_mutex_slowlock+0xd8/0x198
[<ffffffc0000f7f28>] rt_mutex_timed_futex_lock+0x30/0x40
[<ffffffc00012c1a8>] futex_lock_pi+0x200/0x3b0
[<ffffffc00012cf84>] do_futex+0x1c4/0x550
[<ffffffc00012d92c>] compat_SyS_futex+0x10c/0x138
[<ffffffc00008504c>] __sys_trace_return+0x0/0x4
This seems to be the same bug Xuneli Pang triggered and fixed in
e96a7705e7d3 "sched/rtmutex/deadline: Fix a PI crash for deadline
tasks". As noted by Peter Zijlstra in the previous attempt, this fix
requires a few other patches, most notably the FUTEX_UNLOCK_PI series
[1]
Testing this on a dual-core VM I have not been able to reproduce the
same crash, but pi_stress (part of the rt-test suite) reveals that
vanilla 4.4.162 behaves rather badly with a mix of deadline and
sched_(rr|fifo) tasks:
time pi_stress --rr --mlockall --sched id=high,policy=deadline,runtime=100000,deadline=200000,period=200000
Starting PI Stress Test
Number of thread groups: 1
Duration of test run: infinite
Number of inversions per group: unlimited
Admin thread SCHED_RR priority 4
1 groups of 3 threads will be created
High thread SCHED_DEADLINE runtime 100000 deadline 200000 period 200000
Med thread SCHED_RR priority 2
Low thread SCHED_RR priority 1
Current Inversions: 141627
WATCHDOG triggered: group 0 is deadlocked!
reporter stopping due to watchdog event
Stopping test
Terminated
real 0m26.291s
user 0m0.148s
sys 0m18.819s
With this series applied, the test ran for ~4.5 hours and again for 129
minutes (when I remembered to time it) before crashing:
time pi_stress --rr --mlockall --sched id=high,policy=deadline,runtime=100000,deadline=200000,period=200000
Starting PI Stress Test
Number of thread groups: 1
Duration of test run: infinite
Number of inversions per group: unlimited
Admin thread SCHED_RR priority 4
1 groups of 3 threads will be created
High thread SCHED_DEADLINE runtime 100000 deadline 200000 period 200000
Med thread SCHED_RR priority 2
Low thread SCHED_RR priority 1
Current Inversions: 51985223
WATCHDOG triggered: group 0 is deadlocked!
reporter stopping due to watchdog event
Stopping test
Terminated
real 129m38.807s
user 0m59.084s
sys 109m53.666s
So clearly not perfect, but a *lot* better.
The same series on our vendor-4.4 kernel moves pi_stress up from ~30
seconds before deadlock up to the same level as the VM (the test is
still going as of this writing).
I suspect other users of 4.4 would benefit from having these patches
backported, so tag them for stable. I assume 4.9 and 4.14 could benefit
as well, but I have not had time to look into those.
1) https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1359667.html
Peter Zijlstra (13):
futex: Cleanup variable names for futex_top_waiter()
futex: Use smp_store_release() in mark_wake_futex()
futex: Remove rt_mutex_deadlock_account_*()
futex,rt_mutex: Provide futex specific rt_mutex API
futex: Change locking rules
futex: Cleanup refcounting
futex: Rework inconsistent rt_mutex/futex_q state
futex: Pull rt_mutex_futex_unlock() out from under hb->lock
futex,rt_mutex: Introduce rt_mutex_init_waiter()
futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
futex: Futex_unlock_pi() determinism
futex: Drop hb->lock before enqueueing on the rtmutex
Thomas Gleixner (2):
rtmutex: Make wait_lock irq safe
futex: Rename free_pi_state() to put_pi_state()
Xunlei Pang (2):
rtmutex: Deboost before waking up the top waiter
sched/rtmutex/deadline: Fix a PI crash for deadline tasks
include/linux/init_task.h | 1 +
include/linux/sched.h | 2 +
include/linux/sched/rt.h | 1 +
kernel/fork.c | 1 +
kernel/futex.c | 532 ++++++++++++++++++++++++++--------------
kernel/locking/rtmutex-debug.c | 9 -
kernel/locking/rtmutex-debug.h | 3 -
kernel/locking/rtmutex.c | 406 ++++++++++++++++++------------
kernel/locking/rtmutex.h | 2 -
kernel/locking/rtmutex_common.h | 24 +-
kernel/sched/core.c | 2 +
11 files changed, 620 insertions(+), 363 deletions(-)
--
2.7.4
From: Coly Li <colyli(a)suse.de>
Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
allows the writeback rate to be faster if there is no I/O request on a
bcache device. It works well if there is only one bcache device attached
to the cache set. If there are many bcache devices attached to a cache
set, it may introduce performance regression because multiple faster
writeback threads of the idle bcache devices will compete the btree level
locks with the bcache device who have I/O requests coming.
This patch fixes the above issue by only permitting fast writebac when
all bcache devices attached on the cache set are idle. And if one of the
bcache devices has new I/O request coming, minimized all writeback
throughput immediately and let PI controller __update_writeback_rate()
to decide the upcoming writeback rate for each bcache device.
Also when all bcache devices are idle, limited wrieback rate to a small
number is wast of thoughput, especially when backing devices are slower
non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
rate for each backing device if the whole cache set is idle. A faster
writeback rate in idle time means new I/Os may have more available space
for dirty data, and people may observe a better write performance then.
Please note bcache may change its cache mode in run time, and this patch
still works if the cache mode is switched from writeback mode and there
is still dirty data on cache.
Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
Cc: stable(a)vger.kernel.org #4.16+
Signed-off-by: Coly Li <colyli(a)suse.de>
Tested-by: Kai Krakow <kai(a)kaishome.de>
Tested-by: Stefan Priebe <s.priebe(a)profihost.ag>
Cc: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
(cherry picked from commit ea8c5356d39048bc94bae068228f51ddbecc6b89)
Signed-off-by: Kai Krakow <kai(a)kaishome.de>
---
drivers/md/bcache/bcache.h | 10 ++---
drivers/md/bcache/request.c | 54 ++++++++++++++++++++++++-
drivers/md/bcache/super.c | 4 ++
drivers/md/bcache/sysfs.c | 14 +++++--
drivers/md/bcache/util.c | 2 +-
drivers/md/bcache/util.h | 2 +-
drivers/md/bcache/writeback.c | 91 +++++++++++++++++++++++++++++--------------
7 files changed, 133 insertions(+), 44 deletions(-)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index d6bf294f3907..6ba41887664a 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -328,13 +328,6 @@ struct cached_dev {
*/
atomic_t has_dirty;
- /*
- * Set to zero by things that touch the backing volume-- except
- * writeback. Incremented by writeback. Used to determine when to
- * accelerate idle writeback.
- */
- atomic_t backing_idle;
-
struct bch_ratelimit writeback_rate;
struct delayed_work writeback_rate_update;
@@ -514,6 +507,8 @@ struct cache_set {
struct cache_accounting accounting;
unsigned long flags;
+ atomic_t idle_counter;
+ atomic_t at_max_writeback_rate;
struct cache_sb sb;
@@ -523,6 +518,7 @@ struct cache_set {
struct bcache_device **devices;
unsigned devices_max_used;
+ atomic_t attached_dev_nr;
struct list_head cached_devs;
uint64_t cached_dev_sectors;
struct closure caching;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index ae67f5fa8047..6e08eb89abee 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1102,6 +1102,44 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio)
generic_make_request(bio);
}
+static void quit_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *this_dc)
+{
+ int i;
+ struct bcache_device *d;
+ struct cached_dev *dc;
+
+ /*
+ * mutex bch_register_lock may compete with other parallel requesters,
+ * or attach/detach operations on other backing device. Waiting to
+ * the mutex lock may increase I/O request latency for seconds or more.
+ * To avoid such situation, if mutext_trylock() failed, only writeback
+ * rate of current cached device is set to 1, and __update_write_back()
+ * will decide writeback rate of other cached devices (remember now
+ * c->idle_counter is 0 already).
+ */
+ if (mutex_trylock(&bch_register_lock)) {
+ for (i = 0; i < c->devices_max_used; i++) {
+ if (!c->devices[i])
+ continue;
+
+ if (UUID_FLASH_ONLY(&c->uuids[i]))
+ continue;
+
+ d = c->devices[i];
+ dc = container_of(d, struct cached_dev, disk);
+ /*
+ * set writeback rate to default minimum value,
+ * then let update_writeback_rate() to decide the
+ * upcoming rate.
+ */
+ atomic_long_set(&dc->writeback_rate.rate, 1);
+ }
+ mutex_unlock(&bch_register_lock);
+ } else
+ atomic_long_set(&this_dc->writeback_rate.rate, 1);
+}
+
/* Cached devices - read & write stuff */
static blk_qc_t cached_dev_make_request(struct request_queue *q,
@@ -1119,7 +1157,21 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
return BLK_QC_T_NONE;
}
- atomic_set(&dc->backing_idle, 0);
+ if (likely(d->c)) {
+ if (atomic_read(&d->c->idle_counter))
+ atomic_set(&d->c->idle_counter, 0);
+ /*
+ * If at_max_writeback_rate of cache set is true and new I/O
+ * comes, quit max writeback rate of all cached devices
+ * attached to this cache set, and set at_max_writeback_rate
+ * to false.
+ */
+ if (unlikely(atomic_read(&d->c->at_max_writeback_rate) == 1)) {
+ atomic_set(&d->c->at_max_writeback_rate, 0);
+ quit_max_writeback_rate(d->c, dc);
+ }
+ }
+
generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0);
bio_set_dev(bio, dc->bdev);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index fa4058e43202..dc7b6131ddbb 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -696,6 +696,8 @@ static void bcache_device_detach(struct bcache_device *d)
{
lockdep_assert_held(&bch_register_lock);
+ atomic_dec(&d->c->attached_dev_nr);
+
if (test_bit(BCACHE_DEV_DETACHING, &d->flags)) {
struct uuid_entry *u = d->c->uuids + d->id;
@@ -1138,6 +1140,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
bch_cached_dev_run(dc);
bcache_device_link(&dc->disk, c, "bdev");
+ atomic_inc(&c->attached_dev_nr);
/* Allow the writeback thread to proceed */
up_write(&dc->writeback_lock);
@@ -1687,6 +1690,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
c->block_bits = ilog2(sb->block_size);
c->nr_uuids = bucket_bytes(c) / sizeof(struct uuid_entry);
c->devices_max_used = 0;
+ atomic_set(&c->attached_dev_nr, 0);
c->btree_pages = bucket_pages(c);
if (c->btree_pages > BTREE_MAX_PAGES)
c->btree_pages = max_t(int, c->btree_pages / 4,
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 225b15aa0340..a56067e80b10 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -170,7 +170,8 @@ SHOW(__bch_cached_dev)
var_printf(writeback_running, "%i");
var_print(writeback_delay);
var_print(writeback_percent);
- sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9);
+ sysfs_hprint(writeback_rate,
+ atomic_long_read(&dc->writeback_rate.rate) << 9);
sysfs_hprint(io_errors, atomic_read(&dc->io_errors));
sysfs_printf(io_error_limit, "%i", dc->error_limit);
sysfs_printf(io_disable, "%i", dc->io_disable);
@@ -188,7 +189,8 @@ SHOW(__bch_cached_dev)
char change[20];
s64 next_io;
- bch_hprint(rate, dc->writeback_rate.rate << 9);
+ bch_hprint(rate,
+ atomic_long_read(&dc->writeback_rate.rate) << 9);
bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9);
bch_hprint(target, dc->writeback_rate_target << 9);
bch_hprint(proportional,dc->writeback_rate_proportional << 9);
@@ -255,8 +257,12 @@ STORE(__cached_dev)
sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, 0, 40);
- sysfs_strtoul_clamp(writeback_rate,
- dc->writeback_rate.rate, 1, INT_MAX);
+ if (attr == &sysfs_writeback_rate) {
+ int v;
+
+ sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
+ atomic_long_set(&dc->writeback_rate.rate, v);
+ }
sysfs_strtoul_clamp(writeback_rate_update_seconds,
dc->writeback_rate_update_seconds,
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index fc479b026d6d..b15256bcf0e7 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -200,7 +200,7 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
{
uint64_t now = local_clock();
- d->next += div_u64(done * NSEC_PER_SEC, d->rate);
+ d->next += div_u64(done * NSEC_PER_SEC, atomic_long_read(&d->rate));
/* Bound the time. Don't let us fall further than 2 seconds behind
* (this prevents unnecessary backlog that would make it impossible
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cced87f8eb27..f7b0133c9d2f 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -442,7 +442,7 @@ struct bch_ratelimit {
* Rate at which we want to do work, in units per second
* The units here correspond to the units passed to bch_next_delay()
*/
- uint32_t rate;
+ atomic_long_t rate;
};
static inline void bch_ratelimit_reset(struct bch_ratelimit *d)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index ad45ebe1a74b..9f5e33324d1d 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -104,11 +104,56 @@ static void __update_writeback_rate(struct cached_dev *dc)
dc->writeback_rate_proportional = proportional_scaled;
dc->writeback_rate_integral_scaled = integral_scaled;
- dc->writeback_rate_change = new_rate - dc->writeback_rate.rate;
- dc->writeback_rate.rate = new_rate;
+ dc->writeback_rate_change = new_rate -
+ atomic_long_read(&dc->writeback_rate.rate);
+ atomic_long_set(&dc->writeback_rate.rate, new_rate);
dc->writeback_rate_target = target;
}
+static bool set_at_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *dc)
+{
+ /*
+ * Idle_counter is increased everytime when update_writeback_rate() is
+ * called. If all backing devices attached to the same cache set have
+ * identical dc->writeback_rate_update_seconds values, it is about 6
+ * rounds of update_writeback_rate() on each backing device before
+ * c->at_max_writeback_rate is set to 1, and then max wrteback rate set
+ * to each dc->writeback_rate.rate.
+ * In order to avoid extra locking cost for counting exact dirty cached
+ * devices number, c->attached_dev_nr is used to calculate the idle
+ * throushold. It might be bigger if not all cached device are in write-
+ * back mode, but it still works well with limited extra rounds of
+ * update_writeback_rate().
+ */
+ if (atomic_inc_return(&c->idle_counter) <
+ atomic_read(&c->attached_dev_nr) * 6)
+ return false;
+
+ if (atomic_read(&c->at_max_writeback_rate) != 1)
+ atomic_set(&c->at_max_writeback_rate, 1);
+
+ atomic_long_set(&dc->writeback_rate.rate, INT_MAX);
+
+ /* keep writeback_rate_target as existing value */
+ dc->writeback_rate_proportional = 0;
+ dc->writeback_rate_integral_scaled = 0;
+ dc->writeback_rate_change = 0;
+
+ /*
+ * Check c->idle_counter and c->at_max_writeback_rate agagain in case
+ * new I/O arrives during before set_at_max_writeback_rate() returns.
+ * Then the writeback rate is set to 1, and its new value should be
+ * decided via __update_writeback_rate().
+ */
+ if ((atomic_read(&c->idle_counter) <
+ atomic_read(&c->attached_dev_nr) * 6) ||
+ !atomic_read(&c->at_max_writeback_rate))
+ return false;
+
+ return true;
+}
+
static void update_writeback_rate(struct work_struct *work)
{
struct cached_dev *dc = container_of(to_delayed_work(work),
@@ -136,13 +181,20 @@ static void update_writeback_rate(struct work_struct *work)
return;
}
- down_read(&dc->writeback_lock);
+ if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
+ /*
+ * If the whole cache set is idle, set_at_max_writeback_rate()
+ * will set writeback rate to a max number. Then it is
+ * unncessary to update writeback rate for an idle cache set
+ * in maximum writeback rate number(s).
+ */
+ if (!set_at_max_writeback_rate(c, dc)) {
+ down_read(&dc->writeback_lock);
+ __update_writeback_rate(dc);
+ up_read(&dc->writeback_lock);
+ }
+ }
- if (atomic_read(&dc->has_dirty) &&
- dc->writeback_percent)
- __update_writeback_rate(dc);
-
- up_read(&dc->writeback_lock);
/*
* CACHE_SET_IO_DISABLE might be set via sysfs interface,
@@ -422,27 +474,6 @@ static void read_dirty(struct cached_dev *dc)
delay = writeback_delay(dc, size);
- /* If the control system would wait for at least half a
- * second, and there's been no reqs hitting the backing disk
- * for awhile: use an alternate mode where we have at most
- * one contiguous set of writebacks in flight at a time. If
- * someone wants to do IO it will be quick, as it will only
- * have to contend with one operation in flight, and we'll
- * be round-tripping data to the backing disk as quickly as
- * it can accept it.
- */
- if (delay >= HZ / 2) {
- /* 3 means at least 1.5 seconds, up to 7.5 if we
- * have slowed way down.
- */
- if (atomic_inc_return(&dc->backing_idle) >= 3) {
- /* Wait for current I/Os to finish */
- closure_sync(&cl);
- /* And immediately launch a new set. */
- delay = 0;
- }
- }
-
while (!kthread_should_stop() &&
!test_bit(CACHE_SET_IO_DISABLE, &dc->disk.c->flags) &&
delay) {
@@ -715,7 +746,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
dc->writeback_running = true;
dc->writeback_percent = 10;
dc->writeback_delay = 30;
- dc->writeback_rate.rate = 1024;
+ atomic_long_set(&dc->writeback_rate.rate, 1024);
dc->writeback_rate_minimum = 8;
dc->writeback_rate_update_seconds = WRITEBACK_RATE_UPDATE_SECS_DEFAULT;
--
2.16.4
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 834e772c8db0c6a275d75315d90aba4ebbb1e249 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha(a)redhat.com>
Date: Mon, 5 Nov 2018 10:35:47 +0000
Subject: [PATCH] vhost/vsock: fix use-after-free in network stack callers
If the network stack calls .send_pkt()/.cancel_pkt() during .release(),
a struct vhost_vsock use-after-free is possible. This occurs because
.release() does not wait for other CPUs to stop using struct
vhost_vsock.
Switch to an RCU-enabled hashtable (indexed by guest CID) so that
.release() can wait for other CPUs by calling synchronize_rcu(). This
also eliminates vhost_vsock_lock acquisition in the data path so it
could have a positive effect on performance.
This is CVE-2018-14625 "kernel: use-after-free Read in vhost_transport_send_pkt".
Cc: stable(a)vger.kernel.org
Reported-and-tested-by: syzbot+bd391451452fb0b93039(a)syzkaller.appspotmail.com
Reported-by: syzbot+e3e074963495f92a89ed(a)syzkaller.appspotmail.com
Reported-by: syzbot+d5a0a170c5069658b141(a)syzkaller.appspotmail.com
Signed-off-by: Stefan Hajnoczi <stefanha(a)redhat.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 731e2ea2aeca..98ed5be132c6 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -15,6 +15,7 @@
#include <net/sock.h>
#include <linux/virtio_vsock.h>
#include <linux/vhost.h>
+#include <linux/hashtable.h>
#include <net/af_vsock.h>
#include "vhost.h"
@@ -27,14 +28,14 @@ enum {
/* Used to track all the vhost_vsock instances on the system. */
static DEFINE_SPINLOCK(vhost_vsock_lock);
-static LIST_HEAD(vhost_vsock_list);
+static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
struct vhost_vsock {
struct vhost_dev dev;
struct vhost_virtqueue vqs[2];
- /* Link to global vhost_vsock_list, protected by vhost_vsock_lock */
- struct list_head list;
+ /* Link to global vhost_vsock_hash, writes use vhost_vsock_lock */
+ struct hlist_node hash;
struct vhost_work send_pkt_work;
spinlock_t send_pkt_list_lock;
@@ -50,11 +51,14 @@ static u32 vhost_transport_get_local_cid(void)
return VHOST_VSOCK_DEFAULT_HOST_CID;
}
-static struct vhost_vsock *__vhost_vsock_get(u32 guest_cid)
+/* Callers that dereference the return value must hold vhost_vsock_lock or the
+ * RCU read lock.
+ */
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
{
struct vhost_vsock *vsock;
- list_for_each_entry(vsock, &vhost_vsock_list, list) {
+ hash_for_each_possible_rcu(vhost_vsock_hash, vsock, hash, guest_cid) {
u32 other_cid = vsock->guest_cid;
/* Skip instances that have no CID yet */
@@ -69,17 +73,6 @@ static struct vhost_vsock *__vhost_vsock_get(u32 guest_cid)
return NULL;
}
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
-{
- struct vhost_vsock *vsock;
-
- spin_lock_bh(&vhost_vsock_lock);
- vsock = __vhost_vsock_get(guest_cid);
- spin_unlock_bh(&vhost_vsock_lock);
-
- return vsock;
-}
-
static void
vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct vhost_virtqueue *vq)
@@ -210,9 +203,12 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
struct vhost_vsock *vsock;
int len = pkt->len;
+ rcu_read_lock();
+
/* Find the vhost_vsock according to guest context id */
vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
if (!vsock) {
+ rcu_read_unlock();
virtio_transport_free_pkt(pkt);
return -ENODEV;
}
@@ -225,6 +221,8 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
spin_unlock_bh(&vsock->send_pkt_list_lock);
vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+
+ rcu_read_unlock();
return len;
}
@@ -234,12 +232,15 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
struct vhost_vsock *vsock;
struct virtio_vsock_pkt *pkt, *n;
int cnt = 0;
+ int ret = -ENODEV;
LIST_HEAD(freeme);
+ rcu_read_lock();
+
/* Find the vhost_vsock according to guest context id */
vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
if (!vsock)
- return -ENODEV;
+ goto out;
spin_lock_bh(&vsock->send_pkt_list_lock);
list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
@@ -265,7 +266,10 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
vhost_poll_queue(&tx_vq->poll);
}
- return 0;
+ ret = 0;
+out:
+ rcu_read_unlock();
+ return ret;
}
static struct virtio_vsock_pkt *
@@ -533,10 +537,6 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
spin_lock_init(&vsock->send_pkt_list_lock);
INIT_LIST_HEAD(&vsock->send_pkt_list);
vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
-
- spin_lock_bh(&vhost_vsock_lock);
- list_add_tail(&vsock->list, &vhost_vsock_list);
- spin_unlock_bh(&vhost_vsock_lock);
return 0;
out:
@@ -585,9 +585,13 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
struct vhost_vsock *vsock = file->private_data;
spin_lock_bh(&vhost_vsock_lock);
- list_del(&vsock->list);
+ if (vsock->guest_cid)
+ hash_del_rcu(&vsock->hash);
spin_unlock_bh(&vhost_vsock_lock);
+ /* Wait for other CPUs to finish using vsock */
+ synchronize_rcu();
+
/* Iterating over all connections for all CIDs to find orphans is
* inefficient. Room for improvement here. */
vsock_for_each_connected_socket(vhost_vsock_reset_orphans);
@@ -628,12 +632,17 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
/* Refuse if CID is already in use */
spin_lock_bh(&vhost_vsock_lock);
- other = __vhost_vsock_get(guest_cid);
+ other = vhost_vsock_get(guest_cid);
if (other && other != vsock) {
spin_unlock_bh(&vhost_vsock_lock);
return -EADDRINUSE;
}
+
+ if (vsock->guest_cid)
+ hash_del_rcu(&vsock->hash);
+
vsock->guest_cid = guest_cid;
+ hash_add_rcu(vhost_vsock_hash, &vsock->hash, guest_cid);
spin_unlock_bh(&vhost_vsock_lock);
return 0;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c93db7bb6ef3251e0ea48ade311d3e9942748e1c Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <willy(a)infradead.org>
Date: Tue, 27 Nov 2018 13:16:33 -0800
Subject: [PATCH] dax: Check page->mapping isn't NULL
If we race with inode destroy, it's possible for page->mapping to be
NULL before we even enter this routine, as well as after having slept
waiting for the dax entry to become unlocked.
Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: <stable(a)vger.kernel.org>
Reported-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Matthew Wilcox <willy(a)infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn(a)suse.de>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/fs/dax.c b/fs/dax.c
index 9bcce89ea18e..e69fc231833b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -365,7 +365,7 @@ bool dax_lock_mapping_entry(struct page *page)
struct address_space *mapping = READ_ONCE(page->mapping);
locked = false;
- if (!dax_mapping(mapping))
+ if (!mapping || !dax_mapping(mapping))
break;
/*
Hi,
this is a backport of commit 7aa54be297655 ("locking/qspinlock, x86:
Provide liveness guarantee") for the v4.9 stable tree.
For the v4.4 tree the ARCH_USE_QUEUED_SPINLOCKS option got disabled on
x86.
For v4.9 it has been decided to do a minimal backport of the final fix
(including all its dependencies).
With this backport I can't reproduce the issue in the latest v4.9-RT
tree. I was able to boot (and use) an arm64 box with these patches so it
is not broken in an abvious way.
Sebastian
The patch titled
Subject: mm: thp: fix flags for pmd migration when split
has been added to the -mm tree. Its filename is
mm-thp-fix-flags-for-pmd-migration-when-split.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-fix-flags-for-pmd-migration…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-fix-flags-for-pmd-migration…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm: thp: fix flags for pmd migration when split
When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs. However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound()
while actually they don't make sense at all when it's a migration entry.
Fix them up. Since at it, drop the ifdef together as not needed.
Note that if my understanding is correct about the problem then if without
the patch there is chance to lose some of the dirty bits in the migrating
pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset
instead of bit 2) and it could potentially corrupt the memory of an
userspace program which depends on the dirty bit.
Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux(a)gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/huge_memory.c~mm-thp-fix-flags-for-pmd-migration-when-split
+++ a/mm/huge_memory.c
@@ -2144,23 +2144,25 @@ static void __split_huge_pmd_locked(stru
*/
old_pmd = pmdp_invalidate(vma, haddr, pmd);
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
pmd_migration = is_pmd_migration_entry(old_pmd);
- if (pmd_migration) {
+ if (unlikely(pmd_migration)) {
swp_entry_t entry;
entry = pmd_to_swp_entry(old_pmd);
page = pfn_to_page(swp_offset(entry));
- } else
-#endif
+ write = is_write_migration_entry(entry);
+ young = false;
+ soft_dirty = pmd_swp_soft_dirty(old_pmd);
+ } else {
page = pmd_page(old_pmd);
+ if (pmd_dirty(old_pmd))
+ SetPageDirty(page);
+ write = pmd_write(old_pmd);
+ young = pmd_young(old_pmd);
+ soft_dirty = pmd_soft_dirty(old_pmd);
+ }
VM_BUG_ON_PAGE(!page_count(page), page);
page_ref_add(page, HPAGE_PMD_NR - 1);
- if (pmd_dirty(old_pmd))
- SetPageDirty(page);
- write = pmd_write(old_pmd);
- young = pmd_young(old_pmd);
- soft_dirty = pmd_soft_dirty(old_pmd);
/*
* Withdraw the table only after we mark the pmd entry invalid.
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-thp-fix-flags-for-pmd-migration-when-split.patch
userfaultfd-clear-flag-if-remap-event-not-enabled.patch