This is a note to let you know that I've just added the patch titled
nvmem: core: add a missing of_node_put
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 63879e2964bceee2aa5bbe8b99ea58bba28bb64f Mon Sep 17 00:00:00 2001
From: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Date: Fri, 11 Jun 2021 11:23:21 +0100
Subject: nvmem: core: add a missing of_node_put
'for_each_child_of_node' performs an of_node_get on each iteration, so a
return from the middle of the loop requires an of_node_put.
Fixes: e888d445ac33 ("nvmem: resolve cells from DT at registration time")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20210611102321.11509-1-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvmem/core.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 4d1c4f83b22f..20799e622b5b 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -690,15 +690,17 @@ static int nvmem_add_cells_from_of(struct nvmem_device *nvmem)
continue;
if (len < 2 * sizeof(u32)) {
dev_err(dev, "nvmem: invalid reg on %pOF\n", child);
+ of_node_put(child);
return -EINVAL;
}
cell = kzalloc(sizeof(*cell), GFP_KERNEL);
- if (!cell)
+ if (!cell) {
+ of_node_put(child);
return -ENOMEM;
+ }
cell->nvmem = nvmem;
- cell->np = of_node_get(child);
cell->offset = be32_to_cpup(addr++);
cell->bytes = be32_to_cpup(addr);
cell->name = kasprintf(GFP_KERNEL, "%pOFn", child);
@@ -719,11 +721,12 @@ static int nvmem_add_cells_from_of(struct nvmem_device *nvmem)
cell->name, nvmem->stride);
/* Cells already added will be freed later. */
kfree_const(cell->name);
- of_node_put(cell->np);
kfree(cell);
+ of_node_put(child);
return -EINVAL;
}
+ cell->np = of_node_get(child);
nvmem_cell_add(cell);
}
--
2.32.0
We can deadlock when rmmod'ing the driver or going through firmware
reset, because the cfg80211_unregister_wdev() has to bring down the link
for us, ... which then grab the same wiphy lock.
nl80211_del_interface() already handles a very similar case, with a nice
description:
/*
* We hold RTNL, so this is safe, without RTNL opencount cannot
* reach 0, and thus the rdev cannot be deleted.
*
* We need to do it for the dev_close(), since that will call
* the netdev notifiers, and we need to acquire the mutex there
* but don't know if we get there from here or from some other
* place (e.g. "ip link set ... down").
*/
mutex_unlock(&rdev->wiphy.mtx);
...
Do similarly for mwifiex teardown, by ensuring we bring the link down
first.
Sample deadlock trace:
[ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds.
[ 247.110630] Not tainted 5.12.4 #5
[ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 247.124557] task:rmmod state:D stack: 0 pid: 2119 ppid: 2114 flags:0x00400208
[ 247.133905] Call trace:
[ 247.136644] __switch_to+0x130/0x170
[ 247.140643] __schedule+0x714/0xa0c
[ 247.144548] schedule_preempt_disabled+0x88/0xf4
[ 247.149714] __mutex_lock_common+0x43c/0x750
[ 247.154496] mutex_lock_nested+0x5c/0x68
[ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211]
[ 247.165769] raw_notifier_call_chain+0x4c/0x78
[ 247.170742] call_netdevice_notifiers_info+0x68/0xa4
[ 247.176305] __dev_close_many+0x7c/0x138
[ 247.180693] dev_close_many+0x7c/0x10c
[ 247.184893] unregister_netdevice_many+0xfc/0x654
[ 247.190158] unregister_netdevice_queue+0xb4/0xe0
[ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211]
[ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211]
[ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex]
[ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex]
[ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex]
[ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie]
[ 247.231451] pci_device_remove+0x50/0xe0
[ 247.235849] device_release_driver_internal+0x110/0x1b0
[ 247.241701] driver_detach+0x5c/0x9c
[ 247.245704] bus_remove_driver+0x84/0xb8
[ 247.250095] driver_unregister+0x3c/0x60
[ 247.254486] pci_unregister_driver+0x2c/0x90
[ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie]
Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/linux-wireless/98392296-40ee-6300-369c-32e16cff3725…
Link: https://lore.kernel.org/linux-wireless/ab4d00ce52f32bd8e45ad0448a44737e@bew…
Reported-by: Maximilian Luz <luzmaximilian(a)gmail.com>
Reported-by: dave(a)bewaar.me
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/net/wireless/marvell/mwifiex/main.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index 529dfd8b7ae8..17399d4aa129 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1445,11 +1445,18 @@ static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter)
if (!priv)
continue;
rtnl_lock();
- wiphy_lock(adapter->wiphy);
if (priv->netdev &&
- priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED)
+ priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) {
+ /*
+ * Close the netdev now, because if we do it later, the
+ * netdev notifiers will need to acquire the wiphy lock
+ * again --> deadlock.
+ */
+ dev_close(priv->wdev.netdev);
+ wiphy_lock(adapter->wiphy);
mwifiex_del_virtual_intf(adapter->wiphy, &priv->wdev);
- wiphy_unlock(adapter->wiphy);
+ wiphy_unlock(adapter->wiphy);
+ }
rtnl_unlock();
}
--
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K
Author: Benjamin Drung <bdrung(a)posteo.de>
Date: Sat Jun 5 22:15:36 2021 +0200
The Elgato Cam Link 4K HDMI video capture card reports to support three
different pixel formats, where the first format depends on the connected
HDMI device.
```
$ v4l2-ctl -d /dev/video0 --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
Type: Video Capture
[0]: 'NV12' (Y/CbCr 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
[1]: 'NV12' (Y/CbCr 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
[2]: 'YU12' (Planar YUV 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
```
Changing the pixel format to anything besides the first pixel format
does not work:
```
$ v4l2-ctl -d /dev/video0 --try-fmt-video pixelformat=YU12
Format Video Capture:
Width/Height : 3840/2160
Pixel Format : 'NV12' (Y/CbCr 4:2:0)
Field : None
Bytes per Line : 3840
Size Image : 12441600
Colorspace : sRGB
Transfer Function : Rec. 709
YCbCr/HSV Encoding: Rec. 709
Quantization : Default (maps to Limited Range)
Flags :
```
User space applications like VLC might show an error message on the
terminal in that case:
```
libv4l2: error set_fmt gave us a different result than try_fmt!
```
Depending on the error handling of the user space applications, they
might display a distorted video, because they use the wrong pixel format
for decoding the stream.
The Elgato Cam Link 4K responds to the USB video probe
VS_PROBE_CONTROL/VS_COMMIT_CONTROL with a malformed data structure: The
second byte contains bFormatIndex (instead of being the second byte of
bmHint). The first byte is always zero. The third byte is always 1.
The firmware bug was reported to Elgato on 2020-12-01 and it was
forwarded by the support team to the developers as feature request.
There is no firmware update available since then. The latest firmware
for Elgato Cam Link 4K as of 2021-03-23 has MCU 20.02.19 and FPGA 67.
Therefore correct the malformed data structure for this device. The
change was successfully tested with VLC, OBS, and Chromium using
different pixel formats (YUYV, NV12, YU12), resolutions (3840x2160,
1920x1080), and frame rates (29.970 and 59.940 fps).
Cc: stable(a)vger.kernel.org
Signed-off-by: Benjamin Drung <bdrung(a)posteo.de>
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/usb/uvc/uvc_video.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
---
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index a777b389a66e..e16464606b14 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -127,10 +127,37 @@ int uvc_query_ctrl(struct uvc_device *dev, u8 query, u8 unit,
static void uvc_fixup_video_ctrl(struct uvc_streaming *stream,
struct uvc_streaming_control *ctrl)
{
+ static const struct usb_device_id elgato_cam_link_4k = {
+ USB_DEVICE(0x0fd9, 0x0066)
+ };
struct uvc_format *format = NULL;
struct uvc_frame *frame = NULL;
unsigned int i;
+ /*
+ * The response of the Elgato Cam Link 4K is incorrect: The second byte
+ * contains bFormatIndex (instead of being the second byte of bmHint).
+ * The first byte is always zero. The third byte is always 1.
+ *
+ * The UVC 1.5 class specification defines the first five bits in the
+ * bmHint bitfield. The remaining bits are reserved and should be zero.
+ * Therefore a valid bmHint will be less than 32.
+ *
+ * Latest Elgato Cam Link 4K firmware as of 2021-03-23 needs this fix.
+ * MCU: 20.02.19, FPGA: 67
+ */
+ if (usb_match_one_id(stream->dev->intf, &elgato_cam_link_4k) &&
+ ctrl->bmHint > 255) {
+ u8 corrected_format_index = ctrl->bmHint >> 8;
+
+ uvc_dbg(stream->dev, VIDEO,
+ "Correct USB video probe response from {bmHint: 0x%04x, bFormatIndex: %u} to {bmHint: 0x%04x, bFormatIndex: %u}\n",
+ ctrl->bmHint, ctrl->bFormatIndex,
+ 1, corrected_format_index);
+ ctrl->bmHint = 1;
+ ctrl->bFormatIndex = corrected_format_index;
+ }
+
for (i = 0; i < stream->nformats; ++i) {
if (stream->format[i].index == ctrl->bFormatIndex) {
format = &stream->format[i];
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K
Author: Benjamin Drung <bdrung(a)posteo.de>
Date: Sat Jun 5 22:15:36 2021 +0200
The Elgato Cam Link 4K HDMI video capture card reports to support three
different pixel formats, where the first format depends on the connected
HDMI device.
```
$ v4l2-ctl -d /dev/video0 --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
Type: Video Capture
[0]: 'NV12' (Y/CbCr 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
[1]: 'NV12' (Y/CbCr 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
[2]: 'YU12' (Planar YUV 4:2:0)
Size: Discrete 3840x2160
Interval: Discrete 0.033s (29.970 fps)
```
Changing the pixel format to anything besides the first pixel format
does not work:
```
$ v4l2-ctl -d /dev/video0 --try-fmt-video pixelformat=YU12
Format Video Capture:
Width/Height : 3840/2160
Pixel Format : 'NV12' (Y/CbCr 4:2:0)
Field : None
Bytes per Line : 3840
Size Image : 12441600
Colorspace : sRGB
Transfer Function : Rec. 709
YCbCr/HSV Encoding: Rec. 709
Quantization : Default (maps to Limited Range)
Flags :
```
User space applications like VLC might show an error message on the
terminal in that case:
```
libv4l2: error set_fmt gave us a different result than try_fmt!
```
Depending on the error handling of the user space applications, they
might display a distorted video, because they use the wrong pixel format
for decoding the stream.
The Elgato Cam Link 4K responds to the USB video probe
VS_PROBE_CONTROL/VS_COMMIT_CONTROL with a malformed data structure: The
second byte contains bFormatIndex (instead of being the second byte of
bmHint). The first byte is always zero. The third byte is always 1.
The firmware bug was reported to Elgato on 2020-12-01 and it was
forwarded by the support team to the developers as feature request.
There is no firmware update available since then. The latest firmware
for Elgato Cam Link 4K as of 2021-03-23 has MCU 20.02.19 and FPGA 67.
Therefore correct the malformed data structure for this device. The
change was successfully tested with VLC, OBS, and Chromium using
different pixel formats (YUYV, NV12, YU12), resolutions (3840x2160,
1920x1080), and frame rates (29.970 and 59.940 fps).
Cc: stable(a)vger.kernel.org
Signed-off-by: Benjamin Drung <bdrung(a)posteo.de>
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/usb/uvc/uvc_video.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
---
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index a777b389a66e..e16464606b14 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -127,10 +127,37 @@ int uvc_query_ctrl(struct uvc_device *dev, u8 query, u8 unit,
static void uvc_fixup_video_ctrl(struct uvc_streaming *stream,
struct uvc_streaming_control *ctrl)
{
+ static const struct usb_device_id elgato_cam_link_4k = {
+ USB_DEVICE(0x0fd9, 0x0066)
+ };
struct uvc_format *format = NULL;
struct uvc_frame *frame = NULL;
unsigned int i;
+ /*
+ * The response of the Elgato Cam Link 4K is incorrect: The second byte
+ * contains bFormatIndex (instead of being the second byte of bmHint).
+ * The first byte is always zero. The third byte is always 1.
+ *
+ * The UVC 1.5 class specification defines the first five bits in the
+ * bmHint bitfield. The remaining bits are reserved and should be zero.
+ * Therefore a valid bmHint will be less than 32.
+ *
+ * Latest Elgato Cam Link 4K firmware as of 2021-03-23 needs this fix.
+ * MCU: 20.02.19, FPGA: 67
+ */
+ if (usb_match_one_id(stream->dev->intf, &elgato_cam_link_4k) &&
+ ctrl->bmHint > 255) {
+ u8 corrected_format_index = ctrl->bmHint >> 8;
+
+ uvc_dbg(stream->dev, VIDEO,
+ "Correct USB video probe response from {bmHint: 0x%04x, bFormatIndex: %u} to {bmHint: 0x%04x, bFormatIndex: %u}\n",
+ ctrl->bmHint, ctrl->bFormatIndex,
+ 1, corrected_format_index);
+ ctrl->bmHint = 1;
+ ctrl->bFormatIndex = corrected_format_index;
+ }
+
for (i = 0; i < stream->nformats; ++i) {
if (stream->format[i].index == ctrl->bFormatIndex) {
format = &stream->format[i];
The redzone area for SLUB exists between s->object_size and s->inuse
(which is at least the word-aligned object_size). If a cache were created
with an object_size smaller than sizeof(void *), the in-object stored
freelist pointer would overwrite the redzone (e.g. with boot param
"slub_debug=ZF"):
BUG test (Tainted: G B ): Right Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........
Object (____ptrval____): f6 f4 a5 40 1d e8 ...@..
Redzone (____ptrval____): 1a aa ..
Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
Store the freelist pointer out of line when object_size is smaller than
sizeof(void *) and redzoning is enabled.
Additionally remove the "smaller than sizeof(void *)" check under
CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant:
SLAB and SLOB both handle small sizes.
(Note that no caches within this size range are known to exist in the
kernel currently.)
Fixes: 81819f0fc828 ("SLUB core")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
mm/slab_common.c | 3 +--
mm/slub.c | 8 +++++---
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a4a571428c51..7cab77655f11 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -97,8 +97,7 @@ EXPORT_SYMBOL(kmem_cache_size);
#ifdef CONFIG_DEBUG_VM
static int kmem_cache_sanity_check(const char *name, unsigned int size)
{
- if (!name || in_interrupt() || size < sizeof(void *) ||
- size > KMALLOC_MAX_SIZE) {
+ if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) {
pr_err("kmem_cache_create(%s) integrity check failed\n", name);
return -EINVAL;
}
diff --git a/mm/slub.c b/mm/slub.c
index f91d9fe7d0d8..f58cfd456548 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3734,15 +3734,17 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
*/
s->inuse = size;
- if (((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) ||
- s->ctor)) {
+ if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) ||
+ ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) ||
+ s->ctor) {
/*
* Relocate free pointer after the object if it is not
* permitted to overwrite the first word of the object on
* kmem_cache_free.
*
* This is the case if we do RCU, have a constructor or
- * destructor or are poisoning the objects.
+ * destructor, are poisoning the objects, or are
+ * redzoning an object smaller than sizeof(void *).
*
* The assumption that s->offset >= s->inuse means free
* pointer is outside of the object is used in the
--
2.25.1
Since LLVM commit 3787ee4, the '-stack-alignment' flag has been dropped [1],
leading to the following error message when building a LTO kernel with
Clang-13 and LLD-13:
ld.lld: error: -plugin-opt=-: ld.lld: Unknown command line argument
'-stack-alignment=8'. Try 'ld.lld --help'
ld.lld: Did you mean '--stackrealign=8'?
It also appears that the '-code-model' flag is not necessary anymore starting
with LLVM-9 [2].
Drop '-code-model' and make '-stack-alignment' conditional on LLD < 13.0.0.
This is for linux-stable 5.12.
Another patch will be submitted for 5.13 shortly (unless there are objections).
Discussion: https://github.com/ClangBuiltLinux/linux/issues/1377
[1]: https://reviews.llvm.org/D103048
[2]: https://reviews.llvm.org/D52322
Signed-off-by: Tor Vic <torvic9(a)mailbox.org>
---
arch/x86/Makefile | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1f2e5bf..2855a1a 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -192,8 +192,9 @@ endif
KBUILD_LDFLAGS += -m elf_$(UTS_MACHINE)
ifdef CONFIG_LTO_CLANG
-KBUILD_LDFLAGS += -plugin-opt=-code-model=kernel \
- -plugin-opt=-stack-alignment=$(if $(CONFIG_X86_32),4,8)
+ifeq ($(shell test $(CONFIG_LLD_VERSION) -lt 130000; echo $$?),0)
+KBUILD_LDFLAGS += -plugin-opt=-stack-alignment=$(if $(CONFIG_X86_32),4,8)
+endif
endif
ifdef CONFIG_X86_NEED_RELOCS
--
2.32.0
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent
freeing of page tables (via local_irq_save()), but not against
concurrent TLB flushes, freeing of data pages, or splitting of compound
pages.
Because no reference is held to the page when try_grab_compound_head()
is called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound
page may have been split up (either explicitly through split_huge_page()
or by freeing the compound page to the buddy allocator and then
allocating its individual order-0 pages).
If that happens, get_user_pages_fast() may end up returning the right
page but lifting the refcount on a now-unrelated page, leading to
use-after-free of pages.
To fix it:
Re-check whether the pages still belong together after lifting the
refcount on the head page.
Move anything else that checks compound_head(page) below the refcount
increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits,
so that PV TLB flushes are used instead of IPIs).
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: stable(a)vger.kernel.org
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
mm/gup.c | 54 +++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 39 insertions(+), 15 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 3ded6a5f26b2..1f9c0ac15073 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -43,8 +43,21 @@ static void hpage_pincount_sub(struct page *page, int refs)
atomic_sub(refs, compound_pincount_ptr(page));
}
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+ VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
+ /*
+ * Calling put_page() for each ref is unnecessarily slow. Only the last
+ * ref needs a put_page().
+ */
+ if (refs > 1)
+ page_ref_sub(page, refs - 1);
+ put_page(page);
+}
+
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
*/
@@ -55,8 +68,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
if (WARN_ON_ONCE(page_ref_count(head) < 0))
return NULL;
if (unlikely(!page_cache_add_speculative(head, refs)))
return NULL;
+
+ /*
+ * At this point we have a stable reference to the head page; but it
+ * could be that between the compound_head() lookup and the refcount
+ * increment, the compound page was split, in which case we'd end up
+ * holding a reference on a page that has nothing to do with the page
+ * we were given anymore.
+ * So now that the head page is stable, recheck that the pages still
+ * belong together.
+ */
+ if (unlikely(compound_head(page) != head)) {
+ put_page_refs(head, refs);
+ return NULL;
+ }
+
return head;
}
/*
@@ -94,25 +122,28 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
if (unlikely((flags & FOLL_LONGTERM) &&
!is_pinnable_page(page)))
return NULL;
+ /*
+ * CAUTION: Don't use compound_head() on the page before this
+ * point, the result won't be stable.
+ */
+ page = try_get_compound_head(page, refs);
+ if (!page)
+ return NULL;
+
/*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
* track it, via hpage_pincount_add/_sub().
*
* However, be sure to *also* increment the normal page refcount
* field at least once, so that the page really is pinned.
*/
- if (!hpage_pincount_available(page))
- refs *= GUP_PIN_COUNTING_BIAS;
-
- page = try_get_compound_head(page, refs);
- if (!page)
- return NULL;
-
if (hpage_pincount_available(page))
hpage_pincount_add(page, refs);
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
@@ -134,16 +165,9 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
else
refs *= GUP_PIN_COUNTING_BIAS;
}
- VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
+ put_page_refs(page, refs);
}
/**
* try_grab_page() - elevate a page's refcount by a flag-dependent amount
base-commit: 614124bea77e452aa6df7a8714e8bc820b489922
--
2.32.0.272.g935e593368-goog
This is the start of the stable review cycle for the 4.19.194 release.
There are 58 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.194-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.194-rc1
Jan Beulich <jbeulich(a)suse.com>
xen-pciback: redo VF placement in the virtual topology
Cheng Jian <cj.chengjian(a)huawei.com>
sched/fair: Optimize select_idle_cpu
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPI: EC: Look for ECDT EC after calling acpi_load_tables()
Erik Schmauss <erik.schmauss(a)intel.com>
ACPI: probe ECDT before loading AML tables regardless of module-level code flag
Marc Zyngier <maz(a)kernel.org>
KVM: arm64: Fix debug register indexing
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Anand Jain <anand.jain(a)oracle.com>
btrfs: fix unmountable seed device after fstrim
Song Liu <songliubraving(a)fb.com>
perf/core: Fix corner case in perf_rotate_context()
Ian Rogers <irogers(a)google.com>
perf/cgroups: Don't rotate events for cgroups unnecessarily
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Remove the setting of dev_port.
Björn Töpel <bjorn.topel(a)gmail.com>
selftests/bpf: Avoid running unprivileged tests with alignment requirements
Björn Töpel <bjorn.topel(a)gmail.com>
selftests/bpf: add "any alignment" annotation for some tests
David S. Miller <davem(a)davemloft.net>
bpf: Apply F_NEEDS_EFFICIENT_UNALIGNED_ACCESS to more ACCEPT test cases.
David S. Miller <davem(a)davemloft.net>
bpf: Make more use of 'any' alignment in test_verifier.c
David S. Miller <davem(a)davemloft.net>
bpf: Adjust F_NEEDS_EFFICIENT_UNALIGNED_ACCESS handling in test_verifier.c
David S. Miller <davem(a)davemloft.net>
bpf: Add BPF_F_ANY_ALIGNMENT.
Joe Stringer <joe(a)wand.net.nz>
selftests/bpf: Generalize dummy program types
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: test make sure to run unpriv test cases in test_verifier
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: fix test suite to enable all unpriv program types
Mina Almasry <almasrymina(a)google.com>
mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fixup error handling in fixup_inode_link_counts
Josef Bacik <josef(a)toxicpanda.com>
btrfs: return errors from btrfs_del_csums in cleanup_ref_head
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fix error handling in btrfs_del_csums
Josef Bacik <josef(a)toxicpanda.com>
btrfs: mark ordered extent and inode with error if we fail to finish
Thomas Gleixner <tglx(a)linutronix.de>
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix data corruption by fallocate
Mark Rutland <mark.rutland(a)arm.com>
pid: take a reference when initializing `cad_pid`
Phil Elwell <phil(a)raspberrypi.com>
usb: dwc2: Fix build in periphal-only mode
Ye Bin <yebin10(a)huawei.com>
ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Marek Vasut <marex(a)denx.de>
ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
Carlos M <carlos.marr.pz(a)gmail.com>
ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
Takashi Iwai <tiwai(a)suse.de>
ALSA: timer: Fix master timer notification
Ahelenia Ziemiańska <nabijaczleweli(a)nabijaczleweli.xyz>
HID: multitouch: require Finger field to mark Win8 reports as MT
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in cfusbl_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in caif_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: add proper error handling
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: added cfserl_release function
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: use correct lock to prevent UAF of hdev object
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: fix the erroneous flush_work() order
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix unique bearer names sanity check
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: add extack messages for bearer/media failure
Magnus Karlsson <magnus.karlsson(a)intel.com>
ixgbevf: add correct exception tracing for XDP
Wei Yongjun <weiyongjun1(a)huawei.com>
ieee802154: fix error return code in ieee802154_llsec_getparams()
Zhen Lei <thunder.leizhen(a)huawei.com>
ieee802154: fix error return code in ieee802154_add_iface()
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Arnd Bergmann <arnd(a)arndb.de>
HID: i2c-hid: fix format string mismatch
Zhen Lei <thunder.leizhen(a)huawei.com>
HID: pidff: fix error return code in hid_pidff_init()
Julian Anastasov <ja(a)ssi.bg>
ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Max Gurtovoy <mgurtovoy(a)nvidia.com>
vfio/platform: fix module_put call in error flow
Wei Yongjun <weiyongjun1(a)huawei.com>
samples: vfio-mdev: fix error handing in mdpy_fb_probe()
Randy Dunlap <rdunlap(a)infradead.org>
vfio/pci: zap_vma_ptes() needs MMU
Zhen Lei <thunder.leizhen(a)huawei.com>
vfio/pci: Fix error return code in vfio_ecap_init()
Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
efi: cper: fix snprintf() use in cper_dimm_err_location()
Heiner Kallweit <hkallweit1(a)gmail.com>
efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Anant Thazhemadam <anant.thazhemadam(a)gmail.com>
nl80211: validate key indexes for cfg80211_registered_device
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ALSA: usb: update old-style static const declaration
Grant Grundler <grundler(a)chromium.org>
net: usb: cdc_ncm: don't spew notifications
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/imx6q-dhcom-som.dtsi | 12 ++
arch/arm64/kvm/sys_regs.c | 42 ++--
arch/x86/include/asm/apic.h | 1 +
arch/x86/kernel/apic/apic.c | 1 +
arch/x86/kernel/apic/vector.c | 20 ++
arch/x86/kvm/svm.c | 8 +-
drivers/acpi/bus.c | 42 ++--
drivers/firmware/efi/cper.c | 4 +-
drivers/firmware/efi/memattr.c | 5 -
drivers/hid/hid-multitouch.c | 10 +-
drivers/hid/i2c-hid/i2c-hid-core.c | 4 +-
drivers/hid/usbhid/hid-pidff.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 +
drivers/net/usb/cdc_ncm.c | 12 +-
drivers/usb/dwc2/core_intr.c | 4 +
drivers/vfio/pci/Kconfig | 1 +
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 2 +-
drivers/xen/xen-pciback/vpci.c | 14 +-
fs/btrfs/extent-tree.c | 12 +-
fs/btrfs/file-item.c | 10 +-
fs/btrfs/inode.c | 12 ++
fs/btrfs/tree-log.c | 13 +-
fs/ext4/extents.c | 43 +++--
fs/ocfs2/file.c | 55 +++++-
include/linux/perf_event.h | 5 +
include/linux/usb/usbnet.h | 2 +
include/net/caif/caif_dev.h | 2 +-
include/net/caif/cfcnfg.h | 2 +-
include/net/caif/cfserl.h | 1 +
include/uapi/linux/bpf.h | 14 ++
init/main.c | 2 +-
kernel/bpf/syscall.c | 7 +-
kernel/bpf/verifier.c | 3 +
kernel/events/core.c | 62 +++---
kernel/sched/fair.c | 7 +-
mm/hugetlb.c | 14 +-
net/bluetooth/hci_core.c | 7 +-
net/bluetooth/hci_sock.c | 4 +-
net/caif/caif_dev.c | 13 +-
net/caif/caif_usb.c | 14 +-
net/caif/cfcnfg.c | 16 +-
net/caif/cfserl.c | 5 +
net/ieee802154/nl-mac.c | 4 +-
net/ieee802154/nl-phy.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
net/netfilter/nfnetlink_cthelper.c | 8 +-
net/nfc/llcp_sock.c | 2 +
net/tipc/bearer.c | 94 ++++++---
net/wireless/core.h | 2 +
net/wireless/nl80211.c | 7 +-
net/wireless/util.c | 39 +++-
samples/vfio-mdev/mdpy-fb.c | 13 +-
sound/core/timer.c | 3 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/usb/mixer_quirks.c | 2 +-
tools/include/uapi/linux/bpf.h | 14 ++
tools/lib/bpf/bpf.c | 8 +-
tools/lib/bpf/bpf.h | 2 +-
tools/testing/selftests/bpf/test_align.c | 4 +-
tools/testing/selftests/bpf/test_verifier.c | 224 ++++++++++++++++------
63 files changed, 676 insertions(+), 275 deletions(-)
From: Eric Biggers <ebiggers(a)google.com>
Typically, the cryptographic APIs that fscrypt uses take keys as byte
arrays, which avoids endianness issues. However, siphash_key_t is an
exception. It is defined as 'u64 key[2];', i.e. the 128-bit key is
expected to be given directly as two 64-bit words in CPU endianness.
fscrypt_derive_dirhash_key() and fscrypt_setup_iv_ino_lblk_32_key()
forgot to take this into account. Therefore, the SipHash keys used to
index encrypted+casefolded directories differ on big endian vs. little
endian platforms, as do the SipHash keys used to hash inode numbers for
IV_INO_LBLK_32-encrypted directories. This makes such directories
non-portable between these platforms.
Fix this by always using the little endian order. This is a breaking
change for big endian platforms, but this should be fine in practice
since these features (encrypt+casefold support, and the IV_INO_LBLK_32
flag) aren't known to actually be used on any big endian platforms yet.
Fixes: aa408f835d02 ("fscrypt: derive dirhash key for casefolded directories")
Fixes: e3b1078bedd3 ("fscrypt: add support for IV_INO_LBLK_32 policies")
Cc: <stable(a)vger.kernel.org> # v5.6+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
v2: Fixed fscrypt_setup_iv_ino_lblk_32_key() too, not just
fscrypt_derive_dirhash_key().
fs/crypto/keysetup.c | 40 ++++++++++++++++++++++++++++++++--------
1 file changed, 32 insertions(+), 8 deletions(-)
diff --git a/fs/crypto/keysetup.c b/fs/crypto/keysetup.c
index 261293fb70974..bca9c6658a7c5 100644
--- a/fs/crypto/keysetup.c
+++ b/fs/crypto/keysetup.c
@@ -210,15 +210,40 @@ static int setup_per_mode_enc_key(struct fscrypt_info *ci,
return err;
}
+/*
+ * Derive a SipHash key from the given fscrypt master key and the given
+ * application-specific information string.
+ *
+ * Note that the KDF produces a byte array, but the SipHash APIs expect the key
+ * as a pair of 64-bit words. Therefore, on big endian CPUs we have to do an
+ * endianness swap in order to get the same results as on little endian CPUs.
+ */
+static int fscrypt_derive_siphash_key(const struct fscrypt_master_key *mk,
+ u8 context, const u8 *info,
+ unsigned int infolen, siphash_key_t *key)
+{
+ int err;
+
+ err = fscrypt_hkdf_expand(&mk->mk_secret.hkdf, context, info, infolen,
+ (u8 *)key, sizeof(*key));
+ if (err)
+ return err;
+
+ BUILD_BUG_ON(sizeof(*key) != 16);
+ BUILD_BUG_ON(ARRAY_SIZE(key->key) != 2);
+ le64_to_cpus(&key->key[0]);
+ le64_to_cpus(&key->key[1]);
+ return 0;
+}
+
int fscrypt_derive_dirhash_key(struct fscrypt_info *ci,
const struct fscrypt_master_key *mk)
{
int err;
- err = fscrypt_hkdf_expand(&mk->mk_secret.hkdf, HKDF_CONTEXT_DIRHASH_KEY,
- ci->ci_nonce, FSCRYPT_FILE_NONCE_SIZE,
- (u8 *)&ci->ci_dirhash_key,
- sizeof(ci->ci_dirhash_key));
+ err = fscrypt_derive_siphash_key(mk, HKDF_CONTEXT_DIRHASH_KEY,
+ ci->ci_nonce, FSCRYPT_FILE_NONCE_SIZE,
+ &ci->ci_dirhash_key);
if (err)
return err;
ci->ci_dirhash_key_initialized = true;
@@ -253,10 +278,9 @@ static int fscrypt_setup_iv_ino_lblk_32_key(struct fscrypt_info *ci,
if (mk->mk_ino_hash_key_initialized)
goto unlock;
- err = fscrypt_hkdf_expand(&mk->mk_secret.hkdf,
- HKDF_CONTEXT_INODE_HASH_KEY, NULL, 0,
- (u8 *)&mk->mk_ino_hash_key,
- sizeof(mk->mk_ino_hash_key));
+ err = fscrypt_derive_siphash_key(mk,
+ HKDF_CONTEXT_INODE_HASH_KEY,
+ NULL, 0, &mk->mk_ino_hash_key);
if (err)
goto unlock;
/* pairs with smp_load_acquire() above */
base-commit: 77f30bfcfcf484da7208affd6a9e63406420bf91
--
2.31.1
We should not create inode for disabled namespace. A disabled namespace
sets its ns->ops to NULL. Kernel could panic if we try to create a inode
for such namespace.
Here is an example oops in socket ioctl cmd SIOCGSKNS when NET_NS is
disabled. Kernel panicked wherever nsfs trys to access ns->ops since the
proc_ns_operations is not implemented in this case.
[7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[7.670268] pgd = 32b54000
[7.670544] [00000010] *pgd=00000000
[7.671861] Internal error: Oops: 5 [#1] SMP ARM
[7.672315] Modules linked in:
[7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16
[7.673309] Hardware name: Generic DT based system
[7.673642] PC is at nsfs_evict+0x24/0x30
[7.674486] LR is at clear_inode+0x20/0x9c
So let's reject such request for disabled namespace.
Signed-off-by: Changbin Du <changbin.du(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Cc: Cong Wang <xiyou.wangcong(a)gmail.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: David Laight <David.Laight(a)ACULAB.COM>
---
fs/nsfs.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/nsfs.c b/fs/nsfs.c
index 800c1d0eb0d0..6c055eb7757b 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
struct inode *inode;
unsigned long d;
+ /* In case the namespace is not actually enabled. */
+ if (!ns->ops)
+ return -EOPNOTSUPP;
+
rcu_read_lock();
d = atomic_long_read(&ns->stashed);
if (!d)
--
2.30.2
The patch titled
Subject: mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
has been added to the -mm tree. Its filename is
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-another-pvmw_sync-fix-in-p…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-another-pvmw_sync-fix-in-p…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
Aha! Shouldn't that quick scan over pte_none()s make sure that it holds
ptlock in the PVMW_SYNC case? That too might have been responsible for
BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though
I've never seen any.
Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/page_vma_mapped.c~mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk
+++ a/mm/page_vma_mapped.c
@@ -277,6 +277,10 @@ next_pte:
goto restart;
}
pvmw->pte++;
+ if ((pvmw->flags & PVMW_SYNC) && !pvmw->ptl) {
+ pvmw->ptl = pte_lockptr(mm, pvmw->pmd);
+ spin_lock(pvmw->ptl);
+ }
} while (pte_none(*pvmw->pte));
if (!pvmw->ptl) {
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
has been added to the -mm tree. Its filename is
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-fix-page_vma_mapped_walk-i…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-fix-page_vma_mapped_walk-i…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
Running certain tests with a DEBUG_VM kernel would crash within hours, on
the total_mapcount BUG() in split_huge_page_to_list(), while trying to
free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which, on
a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
Crash dumps showed two tail pages of a shmem huge page remained mapped by
pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a
long unmapped range; and no page table had yet been allocated for the head
of the huge page to be mapped into.
Although designed to handle these odd misaligned huge-page-mapped-by-pte
cases, page_vma_mapped_walk() falls short by returning false prematurely
when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
are cases when a huge page may span the boundary, with ptes present in the
next.
Restructure page_vma_mapped_walk() as a loop to continue in these cases,
while keeping its layout much as before. Add a step_forward() helper to
advance pvmw->address across those boundaries: originally I tried to use
mm's standard p?d_addr_end() macros, but hit the same crash 512 times less
often: because of the way redundant levels are folded together, but folded
differently in different configurations, it was just too difficult to use
them correctly; and step_forward() is simpler anyway.
Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)
--- a/mm/page_vma_mapped.c~mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes
+++ a/mm/page_vma_mapped.c
@@ -116,6 +116,13 @@ static bool check_pte(struct page_vma_ma
return pfn_is_match(pvmw->page, pfn);
}
+static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size)
+{
+ pvmw->address = (pvmw->address + size) & ~(size - 1);
+ if (!pvmw->address)
+ pvmw->address = ULONG_MAX;
+}
+
/**
* page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at
* @pvmw->address
@@ -183,16 +190,22 @@ bool page_vma_mapped_walk(struct page_vm
if (pvmw->pte)
goto next_pte;
restart:
- {
+ do {
pgd = pgd_offset(mm, pvmw->address);
- if (!pgd_present(*pgd))
- return false;
+ if (!pgd_present(*pgd)) {
+ step_forward(pvmw, PGDIR_SIZE);
+ continue;
+ }
p4d = p4d_offset(pgd, pvmw->address);
- if (!p4d_present(*p4d))
- return false;
+ if (!p4d_present(*p4d)) {
+ step_forward(pvmw, P4D_SIZE);
+ continue;
+ }
pud = pud_offset(p4d, pvmw->address);
- if (!pud_present(*pud))
- return false;
+ if (!pud_present(*pud)) {
+ step_forward(pvmw, PUD_SIZE);
+ continue;
+ }
pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
@@ -240,7 +253,8 @@ restart:
spin_unlock(ptl);
}
- return false;
+ step_forward(pvmw, PMD_SIZE);
+ continue;
}
if (!map_pte(pvmw))
goto next_pte;
@@ -270,7 +284,9 @@ next_pte:
spin_lock(pvmw->ptl);
}
goto this_pte;
- }
+ } while (pvmw->address < end);
+
+ return false;
}
/**
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): get vma_address_end() earlier
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-get-vma_a…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-get-vma_a…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): get vma_address_end() earlier
page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start,
rather than later at next_pte. It's a little unnecessary overhead on the
first call, but makes for a simpler loop in the following commit.
Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-get-vma_address_end-earlier
+++ a/mm/page_vma_mapped.c
@@ -171,6 +171,15 @@ bool page_vma_mapped_walk(struct page_vm
return true;
}
+ /*
+ * Seek to next pte only makes sense for THP.
+ * But more important than that optimization, is to filter out
+ * any PageKsm page: whose page->index misleads vma_address()
+ * and vma_address_end() to disaster.
+ */
+ end = PageTransCompound(page) ?
+ vma_address_end(page, pvmw->vma) :
+ pvmw->address + PAGE_SIZE;
if (pvmw->pte)
goto next_pte;
restart:
@@ -239,10 +248,6 @@ this_pte:
if (check_pte(pvmw))
return true;
next_pte:
- /* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(page))
- return not_found(pvmw);
- end = vma_address_end(page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
if (pvmw->address >= end)
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): use goto instead of while (1)
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-use-goto-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-use-goto-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): use goto instead of while (1)
page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte,
and use "goto this_pte", in place of the "while (1)" loop at the end.
Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-use-goto-instead-of-while-1
+++ a/mm/page_vma_mapped.c
@@ -144,6 +144,7 @@ bool page_vma_mapped_walk(struct page_vm
{
struct mm_struct *mm = pvmw->vma->vm_mm;
struct page *page = pvmw->page;
+ unsigned long end;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
@@ -234,10 +235,7 @@ restart:
}
if (!map_pte(pvmw))
goto next_pte;
- }
- while (1) {
- unsigned long end;
-
+this_pte:
if (check_pte(pvmw))
return true;
next_pte:
@@ -266,6 +264,7 @@ next_pte:
pvmw->ptl = pte_lockptr(mm, pvmw->pmd);
spin_lock(pvmw->ptl);
}
+ goto this_pte;
}
}
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): add a level of indentation
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-add-a-lev…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-add-a-lev…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): add a level of indentation
page_vma_mapped_walk() cleanup: add a level of indentation to much of the
body, making no functional change in this commit, but reducing the later
diff when this is all converted to a loop.
Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 109 +++++++++++++++++++++--------------------
1 file changed, 56 insertions(+), 53 deletions(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-add-a-level-of-indentation
+++ a/mm/page_vma_mapped.c
@@ -173,65 +173,68 @@ bool page_vma_mapped_walk(struct page_vm
if (pvmw->pte)
goto next_pte;
restart:
- pgd = pgd_offset(mm, pvmw->address);
- if (!pgd_present(*pgd))
- return false;
- p4d = p4d_offset(pgd, pvmw->address);
- if (!p4d_present(*p4d))
- return false;
- pud = pud_offset(p4d, pvmw->address);
- if (!pud_present(*pud))
- return false;
+ {
+ pgd = pgd_offset(mm, pvmw->address);
+ if (!pgd_present(*pgd))
+ return false;
+ p4d = p4d_offset(pgd, pvmw->address);
+ if (!p4d_present(*p4d))
+ return false;
+ pud = pud_offset(p4d, pvmw->address);
+ if (!pud_present(*pud))
+ return false;
- pvmw->pmd = pmd_offset(pud, pvmw->address);
- /*
- * Make sure the pmd value isn't cached in a register by the
- * compiler and used as a stale value after we've observed a
- * subsequent update.
- */
- pmde = pmd_read_atomic(pvmw->pmd);
- barrier();
-
- if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
- pmde = *pvmw->pmd;
- if (likely(pmd_trans_huge(pmde))) {
- if (pvmw->flags & PVMW_MIGRATION)
- return not_found(pvmw);
- if (pmd_page(pmde) != page)
- return not_found(pvmw);
- return true;
- }
- if (!pmd_present(pmde)) {
- swp_entry_t entry;
-
- if (!thp_migration_supported() ||
- !(pvmw->flags & PVMW_MIGRATION))
- return not_found(pvmw);
- entry = pmd_to_swp_entry(pmde);
- if (!is_migration_entry(entry) ||
- migration_entry_to_page(entry) != page)
- return not_found(pvmw);
- return true;
- }
- /* THP pmd was split under us: handle on pte level */
- spin_unlock(pvmw->ptl);
- pvmw->ptl = NULL;
- } else if (!pmd_present(pmde)) {
+ pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
- * If PVMW_SYNC, take and drop THP pmd lock so that we
- * cannot return prematurely, while zap_huge_pmd() has
- * cleared *pmd but not decremented compound_mapcount().
+ * Make sure the pmd value isn't cached in a register by the
+ * compiler and used as a stale value after we've observed a
+ * subsequent update.
*/
- if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) {
- spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+ pmde = pmd_read_atomic(pvmw->pmd);
+ barrier();
- spin_unlock(ptl);
+ if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
+ pvmw->ptl = pmd_lock(mm, pvmw->pmd);
+ pmde = *pvmw->pmd;
+ if (likely(pmd_trans_huge(pmde))) {
+ if (pvmw->flags & PVMW_MIGRATION)
+ return not_found(pvmw);
+ if (pmd_page(pmde) != page)
+ return not_found(pvmw);
+ return true;
+ }
+ if (!pmd_present(pmde)) {
+ swp_entry_t entry;
+
+ if (!thp_migration_supported() ||
+ !(pvmw->flags & PVMW_MIGRATION))
+ return not_found(pvmw);
+ entry = pmd_to_swp_entry(pmde);
+ if (!is_migration_entry(entry) ||
+ migration_entry_to_page(entry) != page)
+ return not_found(pvmw);
+ return true;
+ }
+ /* THP pmd was split under us: handle on pte level */
+ spin_unlock(pvmw->ptl);
+ pvmw->ptl = NULL;
+ } else if (!pmd_present(pmde)) {
+ /*
+ * If PVMW_SYNC, take and drop THP pmd lock so that we
+ * cannot return prematurely, while zap_huge_pmd() has
+ * cleared *pmd but not decremented compound_mapcount().
+ */
+ if ((pvmw->flags & PVMW_SYNC) &&
+ PageTransCompound(page)) {
+ spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+
+ spin_unlock(ptl);
+ }
+ return false;
}
- return false;
+ if (!map_pte(pvmw))
+ goto next_pte;
}
- if (!map_pte(pvmw))
- goto next_pte;
while (1) {
unsigned long end;
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): crossing page table boundary
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-crossing-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-crossing-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): crossing page table boundary
page_vma_mapped_walk() cleanup: adjust the test for crossing page table
boundary - I believe pvmw->address is always page-aligned, but nothing
else here assumed that; and remember to reset pvmw->pte to NULL after
unmapping the page table, though I never saw any bug from that.
Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-crossing-page-table-boundary
+++ a/mm/page_vma_mapped.c
@@ -247,16 +247,16 @@ next_pte:
if (pvmw->address >= end)
return not_found(pvmw);
/* Did we cross page table boundary? */
- if (pvmw->address % PMD_SIZE == 0) {
- pte_unmap(pvmw->pte);
+ if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
if (pvmw->ptl) {
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
}
+ pte_unmap(pvmw->pte);
+ pvmw->pte = NULL;
goto restart;
- } else {
- pvmw->pte++;
}
+ pvmw->pte++;
} while (pte_none(*pvmw->pte));
if (!pvmw->ptl) {
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-prettify-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-prettify-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to
follow the same "return not_found, return not_found, return true" pattern
as the block above it (note: returning not_found there is never premature,
since existence or prior existence of huge pmd guarantees good alignment).
Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-prettify-pvmw_migration-block
+++ a/mm/page_vma_mapped.c
@@ -201,24 +201,22 @@ restart:
if (pmd_page(pmde) != page)
return not_found(pvmw);
return true;
- } else if (!pmd_present(pmde)) {
- if (thp_migration_supported()) {
- if (!(pvmw->flags & PVMW_MIGRATION))
- return not_found(pvmw);
- if (is_migration_entry(pmd_to_swp_entry(pmde))) {
- swp_entry_t entry = pmd_to_swp_entry(pmde);
+ }
+ if (!pmd_present(pmde)) {
+ swp_entry_t entry;
- if (migration_entry_to_page(entry) != page)
- return not_found(pvmw);
- return true;
- }
- }
- return not_found(pvmw);
- } else {
- /* THP pmd was split under us: handle on pte level */
- spin_unlock(pvmw->ptl);
- pvmw->ptl = NULL;
+ if (!thp_migration_supported() ||
+ !(pvmw->flags & PVMW_MIGRATION))
+ return not_found(pvmw);
+ entry = pmd_to_swp_entry(pmde);
+ if (!is_migration_entry(entry) ||
+ migration_entry_to_page(entry) != page)
+ return not_found(pvmw);
+ return true;
}
+ /* THP pmd was split under us: handle on pte level */
+ spin_unlock(pvmw->ptl);
+ pvmw->ptl = NULL;
} else if (!pmd_present(pmde)) {
/*
* If PVMW_SYNC, take and drop THP pmd lock so that we
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-use-pmde-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-use-pmde-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then
use it in subsequent tests, instead of repeatedly dereferencing pointer.
Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd
+++ a/mm/page_vma_mapped.c
@@ -194,18 +194,19 @@ restart:
if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
- if (likely(pmd_trans_huge(*pvmw->pmd))) {
+ pmde = *pvmw->pmd;
+ if (likely(pmd_trans_huge(pmde))) {
if (pvmw->flags & PVMW_MIGRATION)
return not_found(pvmw);
- if (pmd_page(*pvmw->pmd) != page)
+ if (pmd_page(pmde) != page)
return not_found(pvmw);
return true;
- } else if (!pmd_present(*pvmw->pmd)) {
+ } else if (!pmd_present(pmde)) {
if (thp_migration_supported()) {
if (!(pvmw->flags & PVMW_MIGRATION))
return not_found(pvmw);
- if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
- swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd);
+ if (is_migration_entry(pmd_to_swp_entry(pmde))) {
+ swp_entry_t entry = pmd_to_swp_entry(pmde);
if (migration_entry_to_page(entry) != page)
return not_found(pvmw);
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): use pmd_read_atomic()
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-use-pmd_r…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-use-pmd_r…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): use pmd_read_atomic()
page_vma_mapped_walk() cleanup: use pmd_read_atomic() with barrier()
instead of READ_ONCE() for pmde: some architectures (e.g. i386 with PAE)
have a multi-word pmd entry, for which READ_ONCE() is not good enough.
Link: https://lkml.kernel.org/r/594c1f0-d396-5346-1f36-606872cddb18@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-use-pmd_read_atomic
+++ a/mm/page_vma_mapped.c
@@ -182,13 +182,16 @@ restart:
pud = pud_offset(p4d, pvmw->address);
if (!pud_present(*pud))
return false;
+
pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
* Make sure the pmd value isn't cached in a register by the
* compiler and used as a stale value after we've observed a
* subsequent update.
*/
- pmde = READ_ONCE(*pvmw->pmd);
+ pmde = pmd_read_atomic(pvmw->pmd);
+ barrier();
+
if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
if (likely(pmd_trans_huge(*pvmw->pmd))) {
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): settle PageHuge on entry
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-settle-pa…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-settle-pa…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): settle PageHuge on entry
page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of the
way at the start, so no need to worry about it later.
Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-settle-pagehuge-on-entry
+++ a/mm/page_vma_mapped.c
@@ -153,10 +153,11 @@ bool page_vma_mapped_walk(struct page_vm
if (pvmw->pmd && !pvmw->pte)
return not_found(pvmw);
- if (pvmw->pte)
- goto next_pte;
-
if (unlikely(PageHuge(page))) {
+ /* The only possible mapping was handled on last iteration */
+ if (pvmw->pte)
+ return not_found(pvmw);
+
/* when pud is not present, pte will be NULL */
pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
if (!pvmw->pte)
@@ -168,6 +169,9 @@ bool page_vma_mapped_walk(struct page_vm
return not_found(pvmw);
return true;
}
+
+ if (pvmw->pte)
+ goto next_pte;
restart:
pgd = pgd_offset(mm, pvmw->address);
if (!pgd_present(*pgd))
@@ -233,7 +237,7 @@ restart:
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(page) || PageHuge(page))
+ if (!PageTransHuge(page))
return not_found(pvmw);
end = vma_address_end(page, pvmw->vma);
do {
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: page_vma_mapped_walk(): use page for pvmw->page
has been added to the -mm tree. Its filename is
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_vma_mapped_walk-use-page-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_vma_mapped_walk-use-page-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: page_vma_mapped_walk(): use page for pvmw->page
Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes".
I've marked all of these for stable: many are merely cleanups,
but I think they are much better before the main fix than after.
This patch (of 11):
page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was
used, sometimes pvmw->page itself: use the local copy "page" throughout.
Link: https://lkml.kernel.org/r/589b358c-febc-c88e-d4c2-7834b37fa7bf@google.com
Link: https://lkml.kernel.org/r/88e67645-f467-c279-bf5e-af4b5c6b13eb@google.com
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/mm/page_vma_mapped.c~mm-page_vma_mapped_walk-use-page-for-pvmw-page
+++ a/mm/page_vma_mapped.c
@@ -156,7 +156,7 @@ bool page_vma_mapped_walk(struct page_vm
if (pvmw->pte)
goto next_pte;
- if (unlikely(PageHuge(pvmw->page))) {
+ if (unlikely(PageHuge(page))) {
/* when pud is not present, pte will be NULL */
pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
if (!pvmw->pte)
@@ -217,8 +217,7 @@ restart:
* cannot return prematurely, while zap_huge_pmd() has
* cleared *pmd but not decremented compound_mapcount().
*/
- if ((pvmw->flags & PVMW_SYNC) &&
- PageTransCompound(pvmw->page)) {
+ if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) {
spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
spin_unlock(ptl);
@@ -234,9 +233,9 @@ restart:
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
+ if (!PageTransHuge(page) || PageHuge(page))
return not_found(pvmw);
- end = vma_address_end(pvmw->page, pvmw->vma);
+ end = vma_address_end(page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
if (pvmw->address >= end)
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
has been added to the -mm tree. Its filename is
mm-thp-replace-debug_vm-bug-with-vm_warn-when-unmap-fails-for-split.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-replace-debug_vm-bug-with-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-replace-debug_vm-bug-with-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
When debugging the bug reported by Wang Yugui [1], try_to_unmap() may
fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however
it may miss the failure when head page is unmapped but other subpage is
mapped. Then the second DEBUG_VM BUG() that check total mapcount would
catch it. This may incur some confusion. And this is not a fatal issue,
so consolidate the two DEBUG_VM checks into one VM_WARN_ON_ONCE_PAGE().
[1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 24 +++++++-----------------
1 file changed, 7 insertions(+), 17 deletions(-)
--- a/mm/huge_memory.c~mm-thp-replace-debug_vm-bug-with-vm_warn-when-unmap-fails-for-split
+++ a/mm/huge_memory.c
@@ -2352,15 +2352,15 @@ static void unmap_page(struct page *page
{
enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC |
TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
- bool unmap_success;
VM_BUG_ON_PAGE(!PageHead(page), page);
if (PageAnon(page))
ttu_flags |= TTU_SPLIT_FREEZE;
- unmap_success = try_to_unmap(page, ttu_flags);
- VM_BUG_ON_PAGE(!unmap_success, page);
+ try_to_unmap(page, ttu_flags);
+
+ VM_WARN_ON_ONCE_PAGE(page_mapped(page), page);
}
static void remap_page(struct page *page, unsigned int nr)
@@ -2671,7 +2671,7 @@ int split_huge_page_to_list(struct page
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
- int count, mapcount, extra_pins, ret;
+ int extra_pins, ret;
pgoff_t end;
VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2730,7 +2730,6 @@ int split_huge_page_to_list(struct page
}
unmap_page(head);
- VM_BUG_ON_PAGE(compound_mapcount(head), head);
/* block interrupt reentry in xa_lock and spinlock */
local_irq_disable();
@@ -2748,9 +2747,7 @@ int split_huge_page_to_list(struct page
/* Prevent deferred_split_scan() touching ->_refcount */
spin_lock(&ds_queue->split_queue_lock);
- count = page_count(head);
- mapcount = total_mapcount(head);
- if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
+ if (page_ref_freeze(head, 1 + extra_pins)) {
if (!list_empty(page_deferred_list(head))) {
ds_queue->split_queue_len--;
list_del(page_deferred_list(head));
@@ -2770,16 +2767,9 @@ int split_huge_page_to_list(struct page
__split_huge_page(page, list, end);
ret = 0;
} else {
- if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
- pr_alert("total_mapcount: %u, page_count(): %u\n",
- mapcount, count);
- if (PageTail(page))
- dump_page(head, NULL);
- dump_page(page, "total_mapcount(head) > 0");
- BUG();
- }
spin_unlock(&ds_queue->split_queue_lock);
-fail: if (mapping)
+fail:
+ if (mapping)
xa_unlock(&mapping->i_pages);
local_irq_enable();
remap_page(head, thp_nr_pages(head));
_
Patches currently in -mm which might be from shy828301(a)gmail.com are
mm-thp-replace-debug_vm-bug-with-vm_warn-when-unmap-fails-for-split.patch
mm-mempolicy-dont-have-to-split-pmd-for-huge-zero-page.patch
mm-memory-add-orig_pmd-to-struct-vm_fault.patch
mm-memory-make-numa_migrate_prep-non-static.patch
mm-thp-refactor-numa-fault-handling.patch
mm-migrate-account-thp-numa-migration-counters-correctly.patch
mm-migrate-dont-split-thp-for-misplaced-numa-page.patch
mm-migrate-check-mapcount-for-thp-instead-of-refcount.patch
mm-thp-skip-make-pmd-prot_none-if-thp-migration-is-not-supported.patch
mm-rmap-make-try_to_unmap-void-function.patch
The patch titled
Subject: mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
has been added to the -mm tree. Its filename is
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-unmap_mapping_page-to-fix-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-unmap_mapping_page-to-fix-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
There is a race between THP unmapping and truncation, when truncate sees
pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it,
but before its page_remove_rmap() gets to decrement compound_mapcount:
generating false "BUG: Bad page cache" reports that the page is still
mapped when deleted. This commit fixes that, but not in the way I hoped.
The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
instead of unmap_mapping_range() in truncate_cleanup_page(): it has often
been an annoyance that we usually call unmap_mapping_range() with no pages
locked, but there apply it to a single locked page. try_to_unmap() looks
more suitable for a single locked page.
However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it
is used to insert THP migration entries, but not used to unmap THPs. Copy
zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are
different, I'm too ignorant of the DAX cases, and couldn't decide how far
to go for anon+swap. Set that aside.
The second attempt took a different tack: make no change in truncate.c,
but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
clearing it initially, then pmd_clear() between page_remove_rmap() and
unlocking at the end. Nice. But powerpc blows that approach out of the
water, with its serialize_against_pte_lookup(), and interesting pgtable
usage. It would need serious help to get working on powerpc (with a minor
optimization issue on s390 too). Set that aside.
Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
that's likely to reduce or eliminate the number of incidents, it would
give less assurance of whether we had identified the problem correctly.
This successful iteration introduces "unmap_mapping_page(page)" instead of
try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with
an addition to details. Then zap_pmd_range() watches for this case, and
does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now
does in the PVMW_SYNC case. Not pretty, but safe.
Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert
its interface; but currently that's only used to make sure that
page->mapping is stable, and zap_pmd_range() doesn't care if the page is
locked or not. Along these lines, in invalidate_inode_pages2_range() move
the initial unmap_mapping_range() out from under page lock, before then
calling unmap_mapping_page() under page lock if still mapped.
Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com
Fixes: fc127da085c2 ("truncate: handle file thp")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 3 +++
mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++
mm/truncate.c | 43 +++++++++++++++++++------------------------
3 files changed, 63 insertions(+), 24 deletions(-)
--- a/include/linux/mm.h~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page
+++ a/include/linux/mm.h
@@ -1719,6 +1719,7 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
+ struct page *single_page; /* Locked page to be unmapped */
};
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
@@ -1766,6 +1767,7 @@ extern vm_fault_t handle_mm_fault(struct
extern int fixup_user_fault(struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked);
+void unmap_mapping_page(struct page *page);
void unmap_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t nr, bool even_cows);
void unmap_mapping_range(struct address_space *mapping,
@@ -1786,6 +1788,7 @@ static inline int fixup_user_fault(struc
BUG();
return -EFAULT;
}
+static inline void unmap_mapping_page(struct page *page) { }
static inline void unmap_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t nr, bool even_cows) { }
static inline void unmap_mapping_range(struct address_space *mapping,
--- a/mm/memory.c~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page
+++ a/mm/memory.c
@@ -1361,7 +1361,18 @@ static inline unsigned long zap_pmd_rang
else if (zap_huge_pmd(tlb, vma, pmd, addr))
goto next;
/* fall through */
+ } else if (details && details->single_page &&
+ PageTransCompound(details->single_page) &&
+ next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) {
+ spinlock_t *ptl = pmd_lock(tlb->mm, pmd);
+ /*
+ * Take and drop THP pmd lock so that we cannot return
+ * prematurely, while zap_huge_pmd() has cleared *pmd,
+ * but not yet decremented compound_mapcount().
+ */
+ spin_unlock(ptl);
}
+
/*
* Here there can be other concurrent MADV_DONTNEED or
* trans huge page faults running, and if the pmd is
@@ -3237,6 +3248,36 @@ static inline void unmap_mapping_range_t
}
/**
+ * unmap_mapping_page() - Unmap single page from processes.
+ * @page: The locked page to be unmapped.
+ *
+ * Unmap this page from any userspace process which still has it mmaped.
+ * Typically, for efficiency, the range of nearby pages has already been
+ * unmapped by unmap_mapping_pages() or unmap_mapping_range(). But once
+ * truncation or invalidation holds the lock on a page, it may find that
+ * the page has been remapped again: and then uses unmap_mapping_page()
+ * to unmap it finally.
+ */
+void unmap_mapping_page(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct zap_details details = { };
+
+ VM_BUG_ON(!PageLocked(page));
+ VM_BUG_ON(PageTail(page));
+
+ details.check_mapping = mapping;
+ details.first_index = page->index;
+ details.last_index = page->index + thp_nr_pages(page) - 1;
+ details.single_page = page;
+
+ i_mmap_lock_write(mapping);
+ if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
+ unmap_mapping_range_tree(&mapping->i_mmap, &details);
+ i_mmap_unlock_write(mapping);
+}
+
+/**
* unmap_mapping_pages() - Unmap pages from processes.
* @mapping: The address space containing pages to be unmapped.
* @start: Index of first page to be unmapped.
--- a/mm/truncate.c~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page
+++ a/mm/truncate.c
@@ -167,13 +167,10 @@ void do_invalidatepage(struct page *page
* its lock, b) when a concurrent invalidate_mapping_pages got there first and
* c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
*/
-static void
-truncate_cleanup_page(struct address_space *mapping, struct page *page)
+static void truncate_cleanup_page(struct page *page)
{
- if (page_mapped(page)) {
- unsigned int nr = thp_nr_pages(page);
- unmap_mapping_pages(mapping, page->index, nr, false);
- }
+ if (page_mapped(page))
+ unmap_mapping_page(page);
if (page_has_private(page))
do_invalidatepage(page, 0, thp_size(page));
@@ -218,7 +215,7 @@ int truncate_inode_page(struct address_s
if (page->mapping != mapping)
return -EIO;
- truncate_cleanup_page(mapping, page);
+ truncate_cleanup_page(page);
delete_from_page_cache(page);
return 0;
}
@@ -325,7 +322,7 @@ void truncate_inode_pages_range(struct a
index = indices[pagevec_count(&pvec) - 1] + 1;
truncate_exceptional_pvec_entries(mapping, &pvec, indices);
for (i = 0; i < pagevec_count(&pvec); i++)
- truncate_cleanup_page(mapping, pvec.pages[i]);
+ truncate_cleanup_page(pvec.pages[i]);
delete_from_page_cache_batch(mapping, &pvec);
for (i = 0; i < pagevec_count(&pvec); i++)
unlock_page(pvec.pages[i]);
@@ -639,6 +636,16 @@ int invalidate_inode_pages2_range(struct
continue;
}
+ if (!did_range_unmap && page_mapped(page)) {
+ /*
+ * If page is mapped, before taking its lock,
+ * zap the rest of the file in one hit.
+ */
+ unmap_mapping_pages(mapping, index,
+ (1 + end - index), false);
+ did_range_unmap = 1;
+ }
+
lock_page(page);
WARN_ON(page_to_index(page) != index);
if (page->mapping != mapping) {
@@ -646,23 +653,11 @@ int invalidate_inode_pages2_range(struct
continue;
}
wait_on_page_writeback(page);
- if (page_mapped(page)) {
- if (!did_range_unmap) {
- /*
- * Zap the rest of the file in one hit.
- */
- unmap_mapping_pages(mapping, index,
- (1 + end - index), false);
- did_range_unmap = 1;
- } else {
- /*
- * Just zap this page
- */
- unmap_mapping_pages(mapping, index,
- 1, false);
- }
- }
+
+ if (page_mapped(page))
+ unmap_mapping_page(page);
BUG_ON(page_mapped(page));
+
ret2 = do_launder_page(mapping, page);
if (ret2 == 0) {
if (!invalidate_complete_page2(mapping, page))
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm/thp: fix page_address_in_vma() on file THP tails
has been added to the -mm tree. Its filename is
mm-thp-fix-page_address_in_vma-on-file-thp-tails.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-fix-page_address_in_vma-on…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-fix-page_address_in_vma-on…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jue Wang <juew(a)google.com>
Subject: mm/thp: fix page_address_in_vma() on file THP tails
Anon THP tails were already supported, but memory-failure may need to use
page_address_in_vma() on file THP tails, which its page->mapping check did
not permit: fix it.
hughd adds: no current usage is known to hit the issue, but this does fix
a subtle trap in a general helper: best fixed in stable sooner than later.
Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Jue Wang <juew(a)google.com>
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/rmap.c~mm-thp-fix-page_address_in_vma-on-file-thp-tails
+++ a/mm/rmap.c
@@ -716,11 +716,11 @@ unsigned long page_address_in_vma(struct
if (!vma->anon_vma || !page__anon_vma ||
vma->anon_vma->root != page__anon_vma->root)
return -EFAULT;
- } else if (page->mapping) {
- if (!vma->vm_file || vma->vm_file->f_mapping != page->mapping)
- return -EFAULT;
- } else
+ } else if (!vma->vm_file) {
+ return -EFAULT;
+ } else if (vma->vm_file->f_mapping != compound_head(page)->mapping) {
return -EFAULT;
+ }
return vma_address(page, vma);
}
_
Patches currently in -mm which might be from juew(a)google.com are
mm-thp-fix-page_address_in_vma-on-file-thp-tails.patch
The patch titled
Subject: mm/thp: fix vma_address() if virtual address below file offset
has been added to the -mm tree. Its filename is
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-fix-vma_address-if-virtual…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-fix-vma_address-if-virtual…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: fix vma_address() if virtual address below file offset
Running certain tests with a DEBUG_VM kernel would crash within hours, on
the total_mapcount BUG() in split_huge_page_to_list(), while trying to
free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which, on
a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
When that BUG() was changed to a WARN(), it would later crash on the
VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in
mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap().
vma_address() is usually correct, but there's a wraparound case when the
vm_start address is unusually low, but vm_pgoff not so low: vma_address()
chooses max(start, vma->vm_start), but that decides on the wrong address,
because start has become almost ULONG_MAX.
Rewrite vma_address() to be more careful about vm_pgoff; move the
VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be
safely used from page_mapped_in_vma() and page_address_in_vma() too.
Add vma_address_end() to apply similar care to end address calculation, in
page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one();
though it raises a question of whether callers would do better to supply
pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.
An irritation is that their apparent generality breaks down on KSM pages,
which cannot be located by the page->index that page_to_pgoff() uses: as
4b0ece6fa016 ("mm: migrate: fix remove_migration_pte() for ksm pages")
once discovered. I dithered over the best thing to do about that, and
have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and
vma_address_end(); though the only place in danger of using it on them was
try_to_unmap_one().
Sidenote: vma_address() and vma_address_end() now use compound_nr() on a
head page, instead of thp_size(): to make the right calculation on a
hugetlbfs page, whether or not THPs are configured. try_to_unmap() is
used on hugetlbfs pages, but perhaps the wrong calculation never mattered.
Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com
Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/internal.h | 51 ++++++++++++++++++++++++++++++-----------
mm/page_vma_mapped.c | 16 ++++--------
mm/rmap.c | 16 ++++++------
3 files changed, 52 insertions(+), 31 deletions(-)
--- a/mm/internal.h~mm-thp-fix-vma_address-if-virtual-address-below-file-offset
+++ a/mm/internal.h
@@ -384,27 +384,52 @@ static inline void mlock_migrate_page(st
extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
/*
- * At what user virtual address is page expected in @vma?
+ * At what user virtual address is page expected in vma?
+ * Returns -EFAULT if all of the page is outside the range of vma.
+ * If page is a compound head, the entire compound page is considered.
*/
static inline unsigned long
-__vma_address(struct page *page, struct vm_area_struct *vma)
+vma_address(struct page *page, struct vm_area_struct *vma)
{
- pgoff_t pgoff = page_to_pgoff(page);
- return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ pgoff_t pgoff;
+ unsigned long address;
+
+ VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */
+ pgoff = page_to_pgoff(page);
+ if (pgoff >= vma->vm_pgoff) {
+ address = vma->vm_start +
+ ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ /* Check for address beyond vma (or wrapped through 0?) */
+ if (address < vma->vm_start || address >= vma->vm_end)
+ address = -EFAULT;
+ } else if (PageHead(page) &&
+ pgoff + compound_nr(page) - 1 >= vma->vm_pgoff) {
+ /* Test above avoids possibility of wrap to 0 on 32-bit */
+ address = vma->vm_start;
+ } else {
+ address = -EFAULT;
+ }
+ return address;
}
+/*
+ * Then at what user virtual address will none of the page be found in vma?
+ * Assumes that vma_address() already returned a good starting address.
+ * If page is a compound head, the entire compound page is considered.
+ */
static inline unsigned long
-vma_address(struct page *page, struct vm_area_struct *vma)
+vma_address_end(struct page *page, struct vm_area_struct *vma)
{
- unsigned long start, end;
-
- start = __vma_address(page, vma);
- end = start + thp_size(page) - PAGE_SIZE;
-
- /* page should be within @vma mapping range */
- VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
+ pgoff_t pgoff;
+ unsigned long address;
- return max(start, vma->vm_start);
+ VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */
+ pgoff = page_to_pgoff(page) + compound_nr(page);
+ address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ /* Check for address beyond vma (or wrapped through 0?) */
+ if (address < vma->vm_start || address > vma->vm_end)
+ address = vma->vm_end;
+ return address;
}
static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf,
--- a/mm/page_vma_mapped.c~mm-thp-fix-vma_address-if-virtual-address-below-file-offset
+++ a/mm/page_vma_mapped.c
@@ -228,18 +228,18 @@ restart:
if (!map_pte(pvmw))
goto next_pte;
while (1) {
+ unsigned long end;
+
if (check_pte(pvmw))
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
return not_found(pvmw);
+ end = vma_address_end(pvmw->page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
- if (pvmw->address >= pvmw->vma->vm_end ||
- pvmw->address >=
- __vma_address(pvmw->page, pvmw->vma) +
- thp_size(pvmw->page))
+ if (pvmw->address >= end)
return not_found(pvmw);
/* Did we cross page table boundary? */
if (pvmw->address % PMD_SIZE == 0) {
@@ -277,14 +277,10 @@ int page_mapped_in_vma(struct page *page
.vma = vma,
.flags = PVMW_SYNC,
};
- unsigned long start, end;
-
- start = __vma_address(page, vma);
- end = start + thp_size(page) - PAGE_SIZE;
- if (unlikely(end < vma->vm_start || start >= vma->vm_end))
+ pvmw.address = vma_address(page, vma);
+ if (pvmw.address == -EFAULT)
return 0;
- pvmw.address = max(start, vma->vm_start);
if (!page_vma_mapped_walk(&pvmw))
return 0;
page_vma_mapped_walk_done(&pvmw);
--- a/mm/rmap.c~mm-thp-fix-vma_address-if-virtual-address-below-file-offset
+++ a/mm/rmap.c
@@ -707,7 +707,6 @@ static bool should_defer_flush(struct mm
*/
unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
{
- unsigned long address;
if (PageAnon(page)) {
struct anon_vma *page__anon_vma = page_anon_vma(page);
/*
@@ -722,10 +721,8 @@ unsigned long page_address_in_vma(struct
return -EFAULT;
} else
return -EFAULT;
- address = __vma_address(page, vma);
- if (unlikely(address < vma->vm_start || address >= vma->vm_end))
- return -EFAULT;
- return address;
+
+ return vma_address(page, vma);
}
pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
@@ -919,7 +916,7 @@ static bool page_mkclean_one(struct page
*/
mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
0, vma, vma->vm_mm, address,
- min(vma->vm_end, address + page_size(page)));
+ vma_address_end(page, vma));
mmu_notifier_invalidate_range_start(&range);
while (page_vma_mapped_walk(&pvmw)) {
@@ -1435,9 +1432,10 @@ static bool try_to_unmap_one(struct page
* Note that the page can not be free in this function as call of
* try_to_unmap() must hold a reference on the page.
*/
+ range.end = PageKsm(page) ?
+ address + PAGE_SIZE : vma_address_end(page, vma);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
- address,
- min(vma->vm_end, address + page_size(page)));
+ address, range.end);
if (PageHuge(page)) {
/*
* If sharing is possible, start and end will be adjusted
@@ -1889,6 +1887,7 @@ static void rmap_walk_anon(struct page *
struct vm_area_struct *vma = avc->vma;
unsigned long address = vma_address(page, vma);
+ VM_BUG_ON_VMA(address == -EFAULT, vma);
cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
@@ -1943,6 +1942,7 @@ static void rmap_walk_file(struct page *
pgoff_start, pgoff_end) {
unsigned long address = vma_address(page, vma);
+ VM_BUG_ON_VMA(address == -EFAULT, vma);
cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
has been added to the -mm tree. Its filename is
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-try_to_unmap-use-ttu_sync-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-try_to_unmap-use-ttu_sync-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
(!unmap_success): with dump_page() showing mapcount:1, but then its raw
struct page output showing _mapcount ffffffff i.e. mapcount 0.
And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it
is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and
further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all
indicative of some mapcount difficulty in development here perhaps. But
the !CONFIG_DEBUG_VM path handles the failures correctly and silently.
I believe the problem is that once a racing unmap has cleared pte or pmd,
try_to_unmap_one() may skip taking the page table lock, and emerge from
try_to_unmap() before the racing task has reached decrementing mapcount.
Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC
to the options, and passing that from unmap_page().
When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same
for both: the slight overhead added should rarely matter, except perhaps
if splitting sparsely-populated multiply-mapped shmem. Once confident
that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated.
Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com
Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/rmap.h | 1 +
mm/huge_memory.c | 2 +-
mm/page_vma_mapped.c | 11 +++++++++++
mm/rmap.c | 17 ++++++++++++++++-
4 files changed, 29 insertions(+), 2 deletions(-)
--- a/include/linux/rmap.h~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting
+++ a/include/linux/rmap.h
@@ -91,6 +91,7 @@ enum ttu_flags {
TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */
TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */
+ TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */
TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */
TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible
* and caller guarantees they will
--- a/mm/huge_memory.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting
+++ a/mm/huge_memory.c
@@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_are
static void unmap_page(struct page *page)
{
- enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
+ enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC |
TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
bool unmap_success;
--- a/mm/page_vma_mapped.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting
+++ a/mm/page_vma_mapped.c
@@ -212,6 +212,17 @@ restart:
pvmw->ptl = NULL;
}
} else if (!pmd_present(pmde)) {
+ /*
+ * If PVMW_SYNC, take and drop THP pmd lock so that we
+ * cannot return prematurely, while zap_huge_pmd() has
+ * cleared *pmd but not decremented compound_mapcount().
+ */
+ if ((pvmw->flags & PVMW_SYNC) &&
+ PageTransCompound(pvmw->page)) {
+ spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+
+ spin_unlock(ptl);
+ }
return false;
}
if (!map_pte(pvmw))
--- a/mm/rmap.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting
+++ a/mm/rmap.c
@@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page
struct mmu_notifier_range range;
enum ttu_flags flags = (enum ttu_flags)(long)arg;
+ /*
+ * When racing against e.g. zap_pte_range() on another cpu,
+ * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * try_to_unmap() may return false when it is about to become true,
+ * if page table locking is skipped: use TTU_SYNC to wait for that.
+ */
+ if (flags & TTU_SYNC)
+ pvmw.flags = PVMW_SYNC;
+
/* munlock has nothing to gain from examining un-locked vmas */
if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
return true;
@@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enu
else
rmap_walk(page, &rwc);
- return !page_mapcount(page) ? true : false;
+ /*
+ * When racing against e.g. zap_pte_range() on another cpu,
+ * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * try_to_unmap() may return false when it is about to become true,
+ * if page table locking is skipped: use TTU_SYNC to wait for that.
+ */
+ return !page_mapcount(page);
}
/**
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm/thp: make is_huge_zero_pmd() safe and quicker
has been added to the -mm tree. Its filename is
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-make-is_huge_zero_pmd-safe…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-make-is_huge_zero_pmd-safe…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: make is_huge_zero_pmd() safe and quicker
Most callers of is_huge_zero_pmd() supply a pmd already verified present;
but a few (notably zap_huge_pmd()) do not - it might be a pmd migration
entry, in which the pfn is encoded differently from a present pmd: which
might pass the is_huge_zero_pmd() test (though not on x86, since L1TF
forced us to protect against that); or perhaps even crash in pmd_page()
applied to a swap-like entry.
Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself;
and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd()
will not need to do that pmd_page() lookup each time.
__split_huge_pmd_locked() checked pmd_trans_huge() before: that worked,
but is unnecessary now that is_huge_zero_pmd() checks present.
Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com
Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jue Wang <juew(a)google.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/huge_mm.h | 8 +++++++-
mm/huge_memory.c | 5 ++++-
2 files changed, 11 insertions(+), 2 deletions(-)
--- a/include/linux/huge_mm.h~mm-thp-make-is_huge_zero_pmd-safe-and-quicker
+++ a/include/linux/huge_mm.h
@@ -286,6 +286,7 @@ struct page *follow_devmap_pud(struct vm
vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd);
extern struct page *huge_zero_page;
+extern unsigned long huge_zero_pfn;
static inline bool is_huge_zero_page(struct page *page)
{
@@ -294,7 +295,7 @@ static inline bool is_huge_zero_page(str
static inline bool is_huge_zero_pmd(pmd_t pmd)
{
- return is_huge_zero_page(pmd_page(pmd));
+ return READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd) && pmd_present(pmd);
}
static inline bool is_huge_zero_pud(pud_t pud)
@@ -439,6 +440,11 @@ static inline bool is_huge_zero_page(str
{
return false;
}
+
+static inline bool is_huge_zero_pmd(pmd_t pmd)
+{
+ return false;
+}
static inline bool is_huge_zero_pud(pud_t pud)
{
--- a/mm/huge_memory.c~mm-thp-make-is_huge_zero_pmd-safe-and-quicker
+++ a/mm/huge_memory.c
@@ -62,6 +62,7 @@ static struct shrinker deferred_split_sh
static atomic_t huge_zero_refcount;
struct page *huge_zero_page __read_mostly;
+unsigned long huge_zero_pfn __read_mostly = ~0UL;
bool transparent_hugepage_enabled(struct vm_area_struct *vma)
{
@@ -98,6 +99,7 @@ retry:
__free_pages(zero_page, compound_order(zero_page));
goto retry;
}
+ WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
/* We take additional reference here. It will be put back by shrinker */
atomic_set(&huge_zero_refcount, 2);
@@ -147,6 +149,7 @@ static unsigned long shrink_huge_zero_pa
if (atomic_cmpxchg(&huge_zero_refcount, 1, 0) == 1) {
struct page *zero_page = xchg(&huge_zero_page, NULL);
BUG_ON(zero_page == NULL);
+ WRITE_ONCE(huge_zero_pfn, ~0UL);
__free_pages(zero_page, compound_order(zero_page));
return HPAGE_PMD_NR;
}
@@ -2071,7 +2074,7 @@ static void __split_huge_pmd_locked(stru
return;
}
- if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
+ if (is_huge_zero_pmd(*pmd)) {
/*
* FIXME: Do we want to invalidate secondary mmu by calling
* mmu_notifier_invalidate_range() see comments below inside
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
The patch titled
Subject: mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
has been added to the -mm tree. Its filename is
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-fix-__split_huge_pmd_locke…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-fix-__split_huge_pmd_locke…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.
Here is v2 batch of long-standing THP bug fixes that I had not got around
to sending before, but prompted now by Wang Yugui's report
https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and they
have done no harm, but have *not* fixed that issue: something more is
needed and I have no idea of what.
This patch (of 7):
Stressing huge tmpfs page migration racing hole punch often crashed on the
VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y
kernel; or shortly afterwards, on a bad dereference in
__split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd
migration entries in the non-anonymous case.
Full disclosure: those particular experiments were on a kernel with more
relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the
vanilla kernel: it is conceivable that stricter locking happens to avoid
those cases, or makes them less likely; but __split_huge_pmd_locked()
already allowed for pmd migration entries when handling anonymous THPs, so
this commit brings the shmem and file THP handling into line.
And while there: use old_pmd rather than _pmd, as in the following blocks;
and make it clearer to the eye that the !vma_is_anonymous() block is
self-contained, making an early return after accounting for unmapping.
Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com
Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com
Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Wang Yugui <wangyugui(a)e16-tech.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Jue Wang <juew(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 27 ++++++++++++++++++---------
mm/pgtable-generic.c | 5 ++---
2 files changed, 20 insertions(+), 12 deletions(-)
--- a/mm/huge_memory.c~mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry
+++ a/mm/huge_memory.c
@@ -2044,7 +2044,7 @@ static void __split_huge_pmd_locked(stru
count_vm_event(THP_SPLIT_PMD);
if (!vma_is_anonymous(vma)) {
- _pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);
+ old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);
/*
* We are going to unmap this huge page. So
* just go ahead and zap it
@@ -2053,16 +2053,25 @@ static void __split_huge_pmd_locked(stru
zap_deposited_table(mm, pmd);
if (vma_is_special_huge(vma))
return;
- page = pmd_page(_pmd);
- if (!PageDirty(page) && pmd_dirty(_pmd))
- set_page_dirty(page);
- if (!PageReferenced(page) && pmd_young(_pmd))
- SetPageReferenced(page);
- page_remove_rmap(page, true);
- put_page(page);
+ if (unlikely(is_pmd_migration_entry(old_pmd))) {
+ swp_entry_t entry;
+
+ entry = pmd_to_swp_entry(old_pmd);
+ page = migration_entry_to_page(entry);
+ } else {
+ page = pmd_page(old_pmd);
+ if (!PageDirty(page) && pmd_dirty(old_pmd))
+ set_page_dirty(page);
+ if (!PageReferenced(page) && pmd_young(old_pmd))
+ SetPageReferenced(page);
+ page_remove_rmap(page, true);
+ put_page(page);
+ }
add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
return;
- } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
+ }
+
+ if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
/*
* FIXME: Do we want to invalidate secondary mmu by calling
* mmu_notifier_invalidate_range() see comments below inside
--- a/mm/pgtable-generic.c~mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry
+++ a/mm/pgtable-generic.c
@@ -135,9 +135,8 @@ pmd_t pmdp_huge_clear_flush(struct vm_ar
{
pmd_t pmd;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
- VM_BUG_ON(!pmd_present(*pmdp));
- /* Below assumes pmd_present() is true */
- VM_BUG_ON(!pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
+ VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) &&
+ !pmd_devmap(*pmdp));
pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return pmd;
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch
mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch
mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch
mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch
mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch
mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch
mm-page_vma_mapped_walk-use-pmd_read_atomic.patch
mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch
mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch
mm-page_vma_mapped_walk-crossing-page-table-boundary.patch
mm-page_vma_mapped_walk-add-a-level-of-indentation.patch
mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch
mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch
mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch
mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch
mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch
mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch
From: Ian Rogers <irogers(a)google.com>
[ Upstream commit fd7d55172d1e2e501e6da0a5c1de25f06612dc2e ]
Currently perf_rotate_context assumes that if the context's nr_events !=
nr_active a rotation is necessary for perf event multiplexing. With
cgroups, nr_events is the total count of events for all cgroups and
nr_active will not include events in a cgroup other than the current
task's. This makes rotation appear necessary for cgroups when it is not.
Add a perf_event_context flag that is set when rotation is necessary.
Clear the flag during sched_out and set it when a flexible sched_in
fails due to resources.
Signed-off-by: Ian Rogers <irogers(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme(a)kernel.org>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Kan Liang <kan.liang(a)linux.intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Stephane Eranian <eranian(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vince Weaver <vincent.weaver(a)maine.edu>
Link: https://lkml.kernel.org/r/20190601082722.44543-1-irogers@google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Wen Yang <wenyang(a)linux.alibaba.com>
---
include/linux/perf_event.h | 5 +++++
kernel/events/core.c | 42 ++++++++++++++++++++++--------------------
2 files changed, 27 insertions(+), 20 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d8b4d31..efe30b9 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -747,6 +747,11 @@ struct perf_event_context {
int nr_stat;
int nr_freq;
int rotate_disable;
+ /*
+ * Set when nr_events != nr_active, except tolerant to events not
+ * necessary to be active due to scheduling constraints, such as cgroups.
+ */
+ int rotate_necessary;
atomic_t refcount;
struct task_struct *task;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b8b74a4..56e3789 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2952,6 +2952,12 @@ static void ctx_sched_out(struct perf_event_context *ctx,
if (!ctx->nr_active || !(is_active & EVENT_ALL))
return;
+ /*
+ * If we had been multiplexing, no rotations are necessary, now no events
+ * are active.
+ */
+ ctx->rotate_necessary = 0;
+
perf_pmu_disable(ctx->pmu);
if (is_active & EVENT_PINNED) {
list_for_each_entry_safe(event, tmp, &ctx->pinned_active, active_list)
@@ -3319,10 +3325,13 @@ static int flexible_sched_in(struct perf_event *event, void *data)
return 0;
if (group_can_go_on(event, sid->cpuctx, sid->can_add_hw)) {
- if (!group_sched_in(event, sid->cpuctx, sid->ctx))
- list_add_tail(&event->active_list, &sid->ctx->flexible_active);
- else
+ int ret = group_sched_in(event, sid->cpuctx, sid->ctx);
+ if (ret) {
sid->can_add_hw = 0;
+ sid->ctx->rotate_necessary = 1;
+ return 0;
+ }
+ list_add_tail(&event->active_list, &sid->ctx->flexible_active);
}
return 0;
@@ -3690,24 +3699,17 @@ static void rotate_ctx(struct perf_event_context *ctx, struct perf_event *event)
static bool perf_rotate_context(struct perf_cpu_context *cpuctx)
{
struct perf_event *cpu_event = NULL, *task_event = NULL;
- bool cpu_rotate = false, task_rotate = false;
- struct perf_event_context *ctx = NULL;
+ struct perf_event_context *task_ctx = NULL;
+ int cpu_rotate, task_rotate;
/*
* Since we run this from IRQ context, nobody can install new
* events, thus the event count values are stable.
*/
- if (cpuctx->ctx.nr_events) {
- if (cpuctx->ctx.nr_events != cpuctx->ctx.nr_active)
- cpu_rotate = true;
- }
-
- ctx = cpuctx->task_ctx;
- if (ctx && ctx->nr_events) {
- if (ctx->nr_events != ctx->nr_active)
- task_rotate = true;
- }
+ cpu_rotate = cpuctx->ctx.rotate_necessary;
+ task_ctx = cpuctx->task_ctx;
+ task_rotate = task_ctx ? task_ctx->rotate_necessary : 0;
if (!(cpu_rotate || task_rotate))
return false;
@@ -3716,7 +3718,7 @@ static bool perf_rotate_context(struct perf_cpu_context *cpuctx)
perf_pmu_disable(cpuctx->ctx.pmu);
if (task_rotate)
- task_event = ctx_first_active(ctx);
+ task_event = ctx_first_active(task_ctx);
if (cpu_rotate)
cpu_event = ctx_first_active(&cpuctx->ctx);
@@ -3724,17 +3726,17 @@ static bool perf_rotate_context(struct perf_cpu_context *cpuctx)
* As per the order given at ctx_resched() first 'pop' task flexible
* and then, if needed CPU flexible.
*/
- if (task_event || (ctx && cpu_event))
- ctx_sched_out(ctx, cpuctx, EVENT_FLEXIBLE);
+ if (task_event || (task_ctx && cpu_event))
+ ctx_sched_out(task_ctx, cpuctx, EVENT_FLEXIBLE);
if (cpu_event)
cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
if (task_event)
- rotate_ctx(ctx, task_event);
+ rotate_ctx(task_ctx, task_event);
if (cpu_event)
rotate_ctx(&cpuctx->ctx, cpu_event);
- perf_event_sched_in(cpuctx, ctx, current);
+ perf_event_sched_in(cpuctx, task_ctx, current);
perf_pmu_enable(cpuctx->ctx.pmu);
perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
--
1.8.3.1
This is a note to let you know that I've just added the patch titled
usb: typec: wcove: Use LE to CPU conversion when accessing
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d5ab95da2a41567440097c277c5771ad13928dad Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Date: Wed, 9 Jun 2021 20:22:02 +0300
Subject: usb: typec: wcove: Use LE to CPU conversion when accessing
msg->header
As LKP noticed the Sparse is not happy about strict type handling:
.../typec/tcpm/wcove.c:380:50: sparse: expected unsigned short [usertype] header
.../typec/tcpm/wcove.c:380:50: sparse: got restricted __le16 const [usertype] header
Fix this by switching to use pd_header_cnt_le() instead of pd_header_cnt()
in the affected code.
Fixes: ae8a2ca8a221 ("usb: typec: Group all TCPCI/TCPM code together")
Fixes: 3c4fb9f16921 ("usb: typec: wcove: start using tcpm for USB PD support")
Reported-by: kernel test robot <lkp(a)intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210609172202.83377-1-andriy.shevchenko@linux.in…
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/wcove.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/typec/tcpm/wcove.c b/drivers/usb/typec/tcpm/wcove.c
index 79ae63950050..5d125339687a 100644
--- a/drivers/usb/typec/tcpm/wcove.c
+++ b/drivers/usb/typec/tcpm/wcove.c
@@ -378,7 +378,7 @@ static int wcove_pd_transmit(struct tcpc_dev *tcpc,
const u8 *data = (void *)msg;
int i;
- for (i = 0; i < pd_header_cnt(msg->header) * 4 + 2; i++) {
+ for (i = 0; i < pd_header_cnt_le(msg->header) * 4 + 2; i++) {
ret = regmap_write(wcove->regmap, USBC_TX_DATA + i,
data[i]);
if (ret)
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: gadget: fsl: Re-enable driver for ARM SoCs
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From e0e8b6abe8c862229ba00cdd806e8598cdef00bb Mon Sep 17 00:00:00 2001
From: Joel Stanley <joel(a)jms.id.au>
Date: Thu, 10 Jun 2021 13:19:57 +0930
Subject: usb: gadget: fsl: Re-enable driver for ARM SoCs
The commit a390bef7db1f ("usb: gadget: fsl_mxc_udc: Remove the driver")
dropped the ARCH_MXC dependency from USB_FSL_USB2, leaving it depending
solely on FSL_SOC.
FSL_SOC is powerpc only; it was briefly available on ARM in 2014 but was
removed by commit cfd074ad8600 ("ARM: imx: temporarily remove
CONFIG_SOC_FSL from LS1021A"). Therefore the driver can no longer be
enabled on ARM platforms.
This appears to be a mistake as arm64's ARCH_LAYERSCAPE and arm32
SOC_LS1021A SoCs use this symbol. It's enabled in these defconfigs:
arch/arm/configs/imx_v6_v7_defconfig:CONFIG_USB_FSL_USB2=y
arch/arm/configs/multi_v7_defconfig:CONFIG_USB_FSL_USB2=y
arch/powerpc/configs/mgcoge_defconfig:CONFIG_USB_FSL_USB2=y
arch/powerpc/configs/mpc512x_defconfig:CONFIG_USB_FSL_USB2=y
To fix, expand the dependencies so USB_FSL_USB2 can be enabled on the
ARM platforms, and with COMPILE_TEST.
Fixes: a390bef7db1f ("usb: gadget: fsl_mxc_udc: Remove the driver")
Signed-off-by: Joel Stanley <joel(a)jms.id.au>
Link: https://lore.kernel.org/r/20210610034957.93376-1-joel@jms.id.au
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/Kconfig b/drivers/usb/gadget/udc/Kconfig
index 8c614bb86c66..7348acbdc560 100644
--- a/drivers/usb/gadget/udc/Kconfig
+++ b/drivers/usb/gadget/udc/Kconfig
@@ -90,7 +90,7 @@ config USB_BCM63XX_UDC
config USB_FSL_USB2
tristate "Freescale Highspeed USB DR Peripheral Controller"
- depends on FSL_SOC
+ depends on FSL_SOC || ARCH_LAYERSCAPE || SOC_LS1021A || COMPILE_TEST
help
Some of Freescale PowerPC and i.MX processors have a High Speed
Dual-Role(DR) USB controller, which supports device mode.
--
2.32.0
Drivers that do not use the ctrl-framework use this function instead.
Fix the following issues:
- Do not check for multiple classes when getting the DEF_VAL.
- Return -EINVAL for request_api calls
- Default value cannot be changed, return EINVAL as soon as possible.
- Return the right error_idx
[If an error is found when validating the list of controls passed with
VIDIOC_G_EXT_CTRLS, then error_idx shall be set to ctrls->count to
indicate to userspace that no actual hardware was touched.
It would have been much nicer of course if error_idx could point to the
control index that failed the validation, but sadly that's not how the
API was designed.]
Fixes v4l2-compliance:
Control ioctls (Input 0):
warn: v4l2-test-controls.cpp(834): error_idx should be equal to count
warn: v4l2-test-controls.cpp(855): error_idx should be equal to count
fail: v4l2-test-controls.cpp(813): doioctl(node, VIDIOC_G_EXT_CTRLS, &ctrls)
test VIDIOC_G/S/TRY_EXT_CTRLS: FAIL
Buffer ioctls (Input 0):
fail: v4l2-test-buffers.cpp(1994): ret != EINVAL && ret != EBADR && ret != ENOTTY
test Requests: FAIL
Cc: stable(a)vger.kernel.org
Fixes: 6fa6f831f095 ("media: v4l2-ctrls: add core request support")
Suggested-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Reviewed-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
drivers/media/v4l2-core/v4l2-ioctl.c | 60 ++++++++++++++++++----------
1 file changed, 39 insertions(+), 21 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 31d1342e61e8..7b5ebdd329e8 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -908,7 +908,7 @@ static void v4l_print_default(const void *arg, bool write_only)
pr_cont("driver-specific ioctl\n");
}
-static int check_ext_ctrls(struct v4l2_ext_controls *c, int allow_priv)
+static bool check_ext_ctrls(struct v4l2_ext_controls *c, unsigned long ioctl)
{
__u32 i;
@@ -917,23 +917,41 @@ static int check_ext_ctrls(struct v4l2_ext_controls *c, int allow_priv)
for (i = 0; i < c->count; i++)
c->controls[i].reserved2[0] = 0;
- /* V4L2_CID_PRIVATE_BASE cannot be used as control class
- when using extended controls.
- Only when passed in through VIDIOC_G_CTRL and VIDIOC_S_CTRL
- is it allowed for backwards compatibility.
- */
- if (!allow_priv && c->which == V4L2_CID_PRIVATE_BASE)
- return 0;
- if (!c->which)
- return 1;
+ switch (c->which) {
+ case V4L2_CID_PRIVATE_BASE:
+ /*
+ * V4L2_CID_PRIVATE_BASE cannot be used as control class
+ * when using extended controls.
+ * Only when passed in through VIDIOC_G_CTRL and VIDIOC_S_CTRL
+ * is it allowed for backwards compatibility.
+ */
+ if (ioctl == VIDIOC_G_CTRL || ioctl == VIDIOC_S_CTRL)
+ return false;
+ break;
+ case V4L2_CTRL_WHICH_DEF_VAL:
+ /* Default value cannot be changed */
+ if (ioctl == VIDIOC_S_EXT_CTRLS ||
+ ioctl == VIDIOC_TRY_EXT_CTRLS) {
+ c->error_idx = c->count;
+ return false;
+ }
+ return true;
+ case V4L2_CTRL_WHICH_CUR_VAL:
+ return true;
+ case V4L2_CTRL_WHICH_REQUEST_VAL:
+ c->error_idx = c->count;
+ return false;
+ }
+
/* Check that all controls are from the same control class. */
for (i = 0; i < c->count; i++) {
if (V4L2_CTRL_ID2WHICH(c->controls[i].id) != c->which) {
- c->error_idx = i;
- return 0;
+ c->error_idx = ioctl == VIDIOC_TRY_EXT_CTRLS ? i :
+ c->count;
+ return false;
}
}
- return 1;
+ return true;
}
static int check_fmt(struct file *file, enum v4l2_buf_type type)
@@ -2229,7 +2247,7 @@ static int v4l_g_ctrl(const struct v4l2_ioctl_ops *ops,
ctrls.controls = &ctrl;
ctrl.id = p->id;
ctrl.value = p->value;
- if (check_ext_ctrls(&ctrls, 1)) {
+ if (check_ext_ctrls(&ctrls, VIDIOC_G_CTRL)) {
int ret = ops->vidioc_g_ext_ctrls(file, fh, &ctrls);
if (ret == 0)
@@ -2263,7 +2281,7 @@ static int v4l_s_ctrl(const struct v4l2_ioctl_ops *ops,
ctrls.controls = &ctrl;
ctrl.id = p->id;
ctrl.value = p->value;
- if (check_ext_ctrls(&ctrls, 1))
+ if (check_ext_ctrls(&ctrls, VIDIOC_S_CTRL))
return ops->vidioc_s_ext_ctrls(file, fh, &ctrls);
return -EINVAL;
}
@@ -2285,8 +2303,8 @@ static int v4l_g_ext_ctrls(const struct v4l2_ioctl_ops *ops,
vfd, vfd->v4l2_dev->mdev, p);
if (ops->vidioc_g_ext_ctrls == NULL)
return -ENOTTY;
- return check_ext_ctrls(p, 0) ? ops->vidioc_g_ext_ctrls(file, fh, p) :
- -EINVAL;
+ return check_ext_ctrls(p, VIDIOC_G_EXT_CTRLS) ?
+ ops->vidioc_g_ext_ctrls(file, fh, p) : -EINVAL;
}
static int v4l_s_ext_ctrls(const struct v4l2_ioctl_ops *ops,
@@ -2306,8 +2324,8 @@ static int v4l_s_ext_ctrls(const struct v4l2_ioctl_ops *ops,
vfd, vfd->v4l2_dev->mdev, p);
if (ops->vidioc_s_ext_ctrls == NULL)
return -ENOTTY;
- return check_ext_ctrls(p, 0) ? ops->vidioc_s_ext_ctrls(file, fh, p) :
- -EINVAL;
+ return check_ext_ctrls(p, VIDIOC_S_EXT_CTRLS) ?
+ ops->vidioc_s_ext_ctrls(file, fh, p) : -EINVAL;
}
static int v4l_try_ext_ctrls(const struct v4l2_ioctl_ops *ops,
@@ -2327,8 +2345,8 @@ static int v4l_try_ext_ctrls(const struct v4l2_ioctl_ops *ops,
vfd, vfd->v4l2_dev->mdev, p);
if (ops->vidioc_try_ext_ctrls == NULL)
return -ENOTTY;
- return check_ext_ctrls(p, 0) ? ops->vidioc_try_ext_ctrls(file, fh, p) :
- -EINVAL;
+ return check_ext_ctrls(p, VIDIOC_TRY_EXT_CTRLS) ?
+ ops->vidioc_try_ext_ctrls(file, fh, p) : -EINVAL;
}
/*
--
2.31.0.291.g576ba9dcdaf-goog
This is the start of the stable review cycle for the 4.19.194 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 11 Jun 2021 06:28:32 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.194-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.194-rc2
Jan Beulich <jbeulich(a)suse.com>
xen-pciback: redo VF placement in the virtual topology
Cheng Jian <cj.chengjian(a)huawei.com>
sched/fair: Optimize select_idle_cpu
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPI: EC: Look for ECDT EC after calling acpi_load_tables()
Erik Schmauss <erik.schmauss(a)intel.com>
ACPI: probe ECDT before loading AML tables regardless of module-level code flag
Marc Zyngier <maz(a)kernel.org>
KVM: arm64: Fix debug register indexing
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Anand Jain <anand.jain(a)oracle.com>
btrfs: fix unmountable seed device after fstrim
Song Liu <songliubraving(a)fb.com>
perf/core: Fix corner case in perf_rotate_context()
Ian Rogers <irogers(a)google.com>
perf/cgroups: Don't rotate events for cgroups unnecessarily
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Remove the setting of dev_port.
Björn Töpel <bjorn.topel(a)gmail.com>
selftests/bpf: Avoid running unprivileged tests with alignment requirements
Björn Töpel <bjorn.topel(a)gmail.com>
selftests/bpf: add "any alignment" annotation for some tests
David S. Miller <davem(a)davemloft.net>
bpf: Apply F_NEEDS_EFFICIENT_UNALIGNED_ACCESS to more ACCEPT test cases.
David S. Miller <davem(a)davemloft.net>
bpf: Make more use of 'any' alignment in test_verifier.c
David S. Miller <davem(a)davemloft.net>
bpf: Adjust F_NEEDS_EFFICIENT_UNALIGNED_ACCESS handling in test_verifier.c
David S. Miller <davem(a)davemloft.net>
bpf: Add BPF_F_ANY_ALIGNMENT.
Joe Stringer <joe(a)wand.net.nz>
selftests/bpf: Generalize dummy program types
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: test make sure to run unpriv test cases in test_verifier
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: fix test suite to enable all unpriv program types
Mina Almasry <almasrymina(a)google.com>
mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fixup error handling in fixup_inode_link_counts
Josef Bacik <josef(a)toxicpanda.com>
btrfs: return errors from btrfs_del_csums in cleanup_ref_head
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fix error handling in btrfs_del_csums
Josef Bacik <josef(a)toxicpanda.com>
btrfs: mark ordered extent and inode with error if we fail to finish
Thomas Gleixner <tglx(a)linutronix.de>
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix data corruption by fallocate
Mark Rutland <mark.rutland(a)arm.com>
pid: take a reference when initializing `cad_pid`
Phil Elwell <phil(a)raspberrypi.com>
usb: dwc2: Fix build in periphal-only mode
Ye Bin <yebin10(a)huawei.com>
ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Carlos M <carlos.marr.pz(a)gmail.com>
ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
Takashi Iwai <tiwai(a)suse.de>
ALSA: timer: Fix master timer notification
Ahelenia Ziemiańska <nabijaczleweli(a)nabijaczleweli.xyz>
HID: multitouch: require Finger field to mark Win8 reports as MT
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in cfusbl_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in caif_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: add proper error handling
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: added cfserl_release function
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: use correct lock to prevent UAF of hdev object
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: fix the erroneous flush_work() order
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix unique bearer names sanity check
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: add extack messages for bearer/media failure
Magnus Karlsson <magnus.karlsson(a)intel.com>
ixgbevf: add correct exception tracing for XDP
Wei Yongjun <weiyongjun1(a)huawei.com>
ieee802154: fix error return code in ieee802154_llsec_getparams()
Zhen Lei <thunder.leizhen(a)huawei.com>
ieee802154: fix error return code in ieee802154_add_iface()
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Arnd Bergmann <arnd(a)arndb.de>
HID: i2c-hid: fix format string mismatch
Zhen Lei <thunder.leizhen(a)huawei.com>
HID: pidff: fix error return code in hid_pidff_init()
Julian Anastasov <ja(a)ssi.bg>
ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Max Gurtovoy <mgurtovoy(a)nvidia.com>
vfio/platform: fix module_put call in error flow
Wei Yongjun <weiyongjun1(a)huawei.com>
samples: vfio-mdev: fix error handing in mdpy_fb_probe()
Randy Dunlap <rdunlap(a)infradead.org>
vfio/pci: zap_vma_ptes() needs MMU
Zhen Lei <thunder.leizhen(a)huawei.com>
vfio/pci: Fix error return code in vfio_ecap_init()
Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
efi: cper: fix snprintf() use in cper_dimm_err_location()
Heiner Kallweit <hkallweit1(a)gmail.com>
efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Anant Thazhemadam <anant.thazhemadam(a)gmail.com>
nl80211: validate key indexes for cfg80211_registered_device
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ALSA: usb: update old-style static const declaration
Grant Grundler <grundler(a)chromium.org>
net: usb: cdc_ncm: don't spew notifications
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/kvm/sys_regs.c | 42 ++--
arch/x86/include/asm/apic.h | 1 +
arch/x86/kernel/apic/apic.c | 1 +
arch/x86/kernel/apic/vector.c | 20 ++
arch/x86/kvm/svm.c | 8 +-
drivers/acpi/bus.c | 42 ++--
drivers/firmware/efi/cper.c | 4 +-
drivers/firmware/efi/memattr.c | 5 -
drivers/hid/hid-multitouch.c | 10 +-
drivers/hid/i2c-hid/i2c-hid-core.c | 4 +-
drivers/hid/usbhid/hid-pidff.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 +
drivers/net/usb/cdc_ncm.c | 12 +-
drivers/usb/dwc2/core_intr.c | 4 +
drivers/vfio/pci/Kconfig | 1 +
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 2 +-
drivers/xen/xen-pciback/vpci.c | 14 +-
fs/btrfs/extent-tree.c | 12 +-
fs/btrfs/file-item.c | 10 +-
fs/btrfs/inode.c | 12 ++
fs/btrfs/tree-log.c | 13 +-
fs/ext4/extents.c | 43 +++--
fs/ocfs2/file.c | 55 +++++-
include/linux/perf_event.h | 5 +
include/linux/usb/usbnet.h | 2 +
include/net/caif/caif_dev.h | 2 +-
include/net/caif/cfcnfg.h | 2 +-
include/net/caif/cfserl.h | 1 +
include/uapi/linux/bpf.h | 14 ++
init/main.c | 2 +-
kernel/bpf/syscall.c | 7 +-
kernel/bpf/verifier.c | 3 +
kernel/events/core.c | 62 +++---
kernel/sched/fair.c | 7 +-
mm/hugetlb.c | 14 +-
net/bluetooth/hci_core.c | 7 +-
net/bluetooth/hci_sock.c | 4 +-
net/caif/caif_dev.c | 13 +-
net/caif/caif_usb.c | 14 +-
net/caif/cfcnfg.c | 16 +-
net/caif/cfserl.c | 5 +
net/ieee802154/nl-mac.c | 4 +-
net/ieee802154/nl-phy.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
net/netfilter/nfnetlink_cthelper.c | 8 +-
net/nfc/llcp_sock.c | 2 +
net/tipc/bearer.c | 94 ++++++---
net/wireless/core.h | 2 +
net/wireless/nl80211.c | 7 +-
net/wireless/util.c | 39 +++-
samples/vfio-mdev/mdpy-fb.c | 13 +-
sound/core/timer.c | 3 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/usb/mixer_quirks.c | 2 +-
tools/include/uapi/linux/bpf.h | 14 ++
tools/lib/bpf/bpf.c | 8 +-
tools/lib/bpf/bpf.h | 2 +-
tools/testing/selftests/bpf/test_align.c | 4 +-
tools/testing/selftests/bpf/test_verifier.c | 224 ++++++++++++++++------
62 files changed, 664 insertions(+), 275 deletions(-)
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
It was reported that a bug on arm64 caused a bad ip address to be used for
updating into a nop in ftrace_init(), but the error path (rightfully)
returned -EINVAL and not -EFAULT, as the bug caused more than one error to
occur. But because -EINVAL was returned, the ftrace_bug() tried to report
what was at the location of the ip address, and read it directly. This
caused the machine to panic, as the ip was not pointing to a valid memory
address.
Instead, read the ip address with copy_from_kernel_nofault() to safely
access the memory, and if it faults, report that the address faulted,
otherwise report what was in that location.
Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.c…
Cc: stable(a)vger.kernel.org
Fixes: 05736a427f7e1 ("ftrace: warn on failure to disable mcount callers")
Reported-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 2e8a3fde7104..72ef4dccbcc4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1967,12 +1967,18 @@ static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
static void print_ip_ins(const char *fmt, const unsigned char *p)
{
+ char ins[MCOUNT_INSN_SIZE];
int i;
+ if (copy_from_kernel_nofault(ins, p, MCOUNT_INSN_SIZE)) {
+ printk(KERN_CONT "%s[FAULT] %px\n", fmt, p);
+ return;
+ }
+
printk(KERN_CONT "%s", fmt);
for (i = 0; i < MCOUNT_INSN_SIZE; i++)
- printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]);
+ printk(KERN_CONT "%s%02x", i ? ":" : "", ins[i]);
}
enum ftrace_bug_type ftrace_bug_type;
--
2.30.2
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Since the "fallthrough" is defined only in the kernel, building
lib/bootconfig.c as a part of user-space tools causes a build
error.
Add a dummy fallthrough to avoid the build error.
Link: https://lkml.kernel.org/r/162087519356.442660.11385099982318160180.stgit@de…
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: 4c1ca831adb1 ("Revert "lib: Revert use of fallthrough pseudo-keyword in lib/"")
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
tools/bootconfig/include/linux/bootconfig.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/bootconfig/include/linux/bootconfig.h b/tools/bootconfig/include/linux/bootconfig.h
index 078cbd2ba651..de7f30f99af3 100644
--- a/tools/bootconfig/include/linux/bootconfig.h
+++ b/tools/bootconfig/include/linux/bootconfig.h
@@ -4,4 +4,8 @@
#include "../../../../include/linux/bootconfig.h"
+#ifndef fallthrough
+# define fallthrough
+#endif
+
#endif
--
2.30.2
Do not tear down the system when getting invalid status from a TPM chip.
This can happen when panic-on-warn is used.
Instead, introduce TPM_TIS_INVALID_STATUS bitflag and use it to trigger
once the error reporting per chip. In addition, print out the value of
TPM_STS for improved forensics.
Link: https://lore.kernel.org/keyrings/YKzlTR1AzUigShtZ@kroah.com/
Fixes: 55707d531af6 ("tpm_tis: Add a check for invalid status")
Cc: stable(a)vger.kernel.org
Cc: Hans de Goede <hdegoede(a)redhat.com>
Cc: Greg KH <greg(a)kroah.com>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v3:
* torn -> tear
* A per chip flag TPM_TIS_INVALID_STATUS.
v2:
Dump also stack only once.
drivers/char/tpm/tpm_tis_core.c | 25 ++++++++++++++++++-------
drivers/char/tpm/tpm_tis_core.h | 3 ++-
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..69579efb247b 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -196,13 +196,24 @@ static u8 tpm_tis_status(struct tpm_chip *chip)
return 0;
if (unlikely((status & TPM_STS_READ_ZERO) != 0)) {
- /*
- * If this trips, the chances are the read is
- * returning 0xff because the locality hasn't been
- * acquired. Usually because tpm_try_get_ops() hasn't
- * been called before doing a TPM operation.
- */
- WARN_ONCE(1, "TPM returned invalid status\n");
+ if (!test_and_set_bit(TPM_TIS_INVALID_STATUS, &priv->flags)) {
+ /*
+ * If this trips, the chances are the read is
+ * returning 0xff because the locality hasn't been
+ * acquired. Usually because tpm_try_get_ops() hasn't
+ * been called before doing a TPM operation.
+ */
+ dev_err(&chip->dev, "invalid TPM_STS.x 0x%02x, dumping stack for forensics\n",
+ status);
+
+ /*
+ * Dump stack for forensics, as invalid TPM_STS.x could be
+ * potentially triggered by impaired tpm_try_get_ops() or
+ * tpm_find_get_ops().
+ */
+ dump_stack();
+ }
+
return 0;
}
diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h
index 9b2d32a59f67..b2a3c6c72882 100644
--- a/drivers/char/tpm/tpm_tis_core.h
+++ b/drivers/char/tpm/tpm_tis_core.h
@@ -83,6 +83,7 @@ enum tis_defaults {
enum tpm_tis_flags {
TPM_TIS_ITPM_WORKAROUND = BIT(0),
+ TPM_TIS_INVALID_STATUS = BIT(1),
};
struct tpm_tis_data {
@@ -90,7 +91,7 @@ struct tpm_tis_data {
int locality;
int irq;
bool irq_tested;
- unsigned int flags;
+ unsigned long flags;
void __iomem *ilb_base_addr;
u16 clkrun_enabled;
wait_queue_head_t int_queue;
--
2.31.1
This is the start of the stable review cycle for the 4.14.236 release.
There are 47 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.236-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.236-rc1
Jan Beulich <jbeulich(a)suse.com>
xen-pciback: redo VF placement in the virtual topology
Cheng Jian <cj.chengjian(a)huawei.com>
sched/fair: Optimize select_idle_cpu
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Remove the setting of dev_port.
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: No need to simulate speculative domain for immediates
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix mask direction swap upon off reg sign change
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Wrap aux data inside bpf_sanitize_info container
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix leakage of uninitialized bpf stack under speculation
Alexei Starovoitov <ast(a)kernel.org>
selftests/bpf: make 'dubious pointer arithmetic' test useful
Alexei Starovoitov <ast(a)fb.com>
selftests/bpf: fix test_align
Alexei Starovoitov <ast(a)kernel.org>
bpf/verifier: disallow pointer subtraction
Alexei Starovoitov <ast(a)kernel.org>
bpf: do not allow root to mangle valid pointers
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Update selftests to reflect new error states
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Tighten speculative pointer arithmetic mask
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Move sanitize_val_alu out of op switch
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Refactor and streamline bounds check into helper
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Improve verifier error messages for users
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Rework ptr_limit into alu_limit and add common error path
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Ensure off_reg has no mixed signed bounds for all types
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Move off_reg into sanitize_ptr_alu
Piotr Krysiuk <piotras(a)gmail.com>
bpf, selftests: Fix up some test_verifier cases for unprivileged
Mina Almasry <almasrymina(a)google.com>
mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fixup error handling in fixup_inode_link_counts
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fix error handling in btrfs_del_csums
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix data corruption by fallocate
Mark Rutland <mark.rutland(a)arm.com>
pid: take a reference when initializing `cad_pid`
Ye Bin <yebin10(a)huawei.com>
ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Takashi Iwai <tiwai(a)suse.de>
ALSA: timer: Fix master timer notification
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in cfusbl_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in caif_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: add proper error handling
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: added cfserl_release function
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: use correct lock to prevent UAF of hdev object
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: fix the erroneous flush_work() order
Wei Yongjun <weiyongjun1(a)huawei.com>
ieee802154: fix error return code in ieee802154_llsec_getparams()
Zhen Lei <thunder.leizhen(a)huawei.com>
ieee802154: fix error return code in ieee802154_add_iface()
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Arnd Bergmann <arnd(a)arndb.de>
HID: i2c-hid: fix format string mismatch
Zhen Lei <thunder.leizhen(a)huawei.com>
HID: pidff: fix error return code in hid_pidff_init()
Julian Anastasov <ja(a)ssi.bg>
ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Max Gurtovoy <mgurtovoy(a)nvidia.com>
vfio/platform: fix module_put call in error flow
Randy Dunlap <rdunlap(a)infradead.org>
vfio/pci: zap_vma_ptes() needs MMU
Zhen Lei <thunder.leizhen(a)huawei.com>
vfio/pci: Fix error return code in vfio_ecap_init()
Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
efi: cper: fix snprintf() use in cper_dimm_err_location()
Heiner Kallweit <hkallweit1(a)gmail.com>
efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Grant Grundler <grundler(a)chromium.org>
net: usb: cdc_ncm: don't spew notifications
-------------
Diffstat:
Makefile | 4 +-
arch/x86/kvm/svm.c | 8 +-
drivers/firmware/efi/cper.c | 4 +-
drivers/firmware/efi/memattr.c | 5 -
drivers/hid/i2c-hid/i2c-hid-core.c | 4 +-
drivers/hid/usbhid/hid-pidff.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
drivers/net/usb/cdc_ncm.c | 12 +-
drivers/vfio/pci/Kconfig | 1 +
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 2 +-
drivers/xen/xen-pciback/vpci.c | 14 +-
fs/btrfs/file-item.c | 10 +-
fs/btrfs/tree-log.c | 13 +-
fs/ext4/extents.c | 43 ++--
fs/ocfs2/file.c | 55 +++-
include/linux/bpf_verifier.h | 5 +-
include/linux/usb/usbnet.h | 2 +
include/net/caif/caif_dev.h | 2 +-
include/net/caif/cfcnfg.h | 2 +-
include/net/caif/cfserl.h | 1 +
init/main.c | 2 +-
kernel/bpf/verifier.c | 369 ++++++++++++++++-----------
kernel/sched/fair.c | 7 +-
mm/hugetlb.c | 14 +-
net/bluetooth/hci_core.c | 7 +-
net/bluetooth/hci_sock.c | 4 +-
net/caif/caif_dev.c | 13 +-
net/caif/caif_usb.c | 14 +-
net/caif/cfcnfg.c | 16 +-
net/caif/cfserl.c | 5 +
net/ieee802154/nl-mac.c | 4 +-
net/ieee802154/nl-phy.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
net/netfilter/nfnetlink_cthelper.c | 8 +-
net/nfc/llcp_sock.c | 2 +
sound/core/timer.c | 3 +-
tools/testing/selftests/bpf/test_align.c | 26 +-
tools/testing/selftests/bpf/test_verifier.c | 114 +++++----
39 files changed, 501 insertions(+), 304 deletions(-)
This is the start of the stable review cycle for the 4.9.272 release.
There are 29 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.272-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.272-rc1
Jan Beulich <jbeulich(a)suse.com>
xen-pciback: redo VF placement in the virtual topology
Michael Weiser <michael.weiser(a)gmx.de>
arm64: Remove unimplemented syscall log message
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Remove the setting of dev_port.
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fixup error handling in fixup_inode_link_counts
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fix error handling in btrfs_del_csums
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix data corruption by fallocate
Mark Rutland <mark.rutland(a)arm.com>
pid: take a reference when initializing `cad_pid`
Ye Bin <yebin10(a)huawei.com>
ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Takashi Iwai <tiwai(a)suse.de>
ALSA: timer: Fix master timer notification
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in cfusbl_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in caif_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: add proper error handling
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: added cfserl_release function
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: use correct lock to prevent UAF of hdev object
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: fix the erroneous flush_work() order
Wei Yongjun <weiyongjun1(a)huawei.com>
ieee802154: fix error return code in ieee802154_llsec_getparams()
Zhen Lei <thunder.leizhen(a)huawei.com>
ieee802154: fix error return code in ieee802154_add_iface()
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Arnd Bergmann <arnd(a)arndb.de>
HID: i2c-hid: fix format string mismatch
Zhen Lei <thunder.leizhen(a)huawei.com>
HID: pidff: fix error return code in hid_pidff_init()
Julian Anastasov <ja(a)ssi.bg>
ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Max Gurtovoy <mgurtovoy(a)nvidia.com>
vfio/platform: fix module_put call in error flow
Randy Dunlap <rdunlap(a)infradead.org>
vfio/pci: zap_vma_ptes() needs MMU
Zhen Lei <thunder.leizhen(a)huawei.com>
vfio/pci: Fix error return code in vfio_ecap_init()
Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
efi: cper: fix snprintf() use in cper_dimm_err_location()
Heiner Kallweit <hkallweit1(a)gmail.com>
efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Grant Grundler <grundler(a)chromium.org>
net: usb: cdc_ncm: don't spew notifications
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/kernel/traps.c | 8 ----
arch/x86/kvm/svm.c | 8 ++--
drivers/firmware/efi/cper.c | 4 +-
drivers/firmware/efi/memattr.c | 5 ---
drivers/hid/i2c-hid/i2c-hid-core.c | 4 +-
drivers/hid/usbhid/hid-pidff.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
drivers/net/usb/cdc_ncm.c | 12 +++++-
drivers/vfio/pci/Kconfig | 1 +
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 2 +-
drivers/xen/xen-pciback/vpci.c | 14 ++++---
fs/btrfs/file-item.c | 10 ++---
fs/btrfs/tree-log.c | 13 ++++---
fs/ext4/extents.c | 43 ++++++++++++----------
fs/ocfs2/file.c | 55 +++++++++++++++++++++++++---
include/linux/usb/usbnet.h | 2 +
include/net/caif/caif_dev.h | 2 +-
include/net/caif/cfcnfg.h | 2 +-
include/net/caif/cfserl.h | 1 +
init/main.c | 2 +-
net/bluetooth/hci_core.c | 7 +++-
net/bluetooth/hci_sock.c | 4 +-
net/caif/caif_dev.c | 13 +++++--
net/caif/caif_usb.c | 14 ++++++-
net/caif/cfcnfg.c | 16 +++++---
net/caif/cfserl.c | 5 +++
net/ieee802154/nl-mac.c | 4 +-
net/ieee802154/nl-phy.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
net/netfilter/nfnetlink_cthelper.c | 8 +++-
net/nfc/llcp_sock.c | 2 +
sound/core/timer.c | 3 +-
34 files changed, 186 insertions(+), 92 deletions(-)
This is the start of the stable review cycle for the 4.4.272 release.
There are 23 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.272-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.272-rc1
Jan Beulich <jbeulich(a)suse.com>
xen-pciback: redo VF placement in the virtual topology
Michael Weiser <michael.weiser(a)gmx.de>
arm64: Remove unimplemented syscall log message
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Josef Bacik <josef(a)toxicpanda.com>
btrfs: fixup error handling in fixup_inode_link_counts
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix data corruption by fallocate
Mark Rutland <mark.rutland(a)arm.com>
pid: take a reference when initializing `cad_pid`
Ye Bin <yebin10(a)huawei.com>
ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Takashi Iwai <tiwai(a)suse.de>
ALSA: timer: Fix master timer notification
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in cfusbl_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: fix memory leak in caif_device_notify
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: add proper error handling
Pavel Skripkin <paskripkin(a)gmail.com>
net: caif: added cfserl_release function
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: use correct lock to prevent UAF of hdev object
Lin Ma <linma(a)zju.edu.cn>
Bluetooth: fix the erroneous flush_work() order
Wei Yongjun <weiyongjun1(a)huawei.com>
ieee802154: fix error return code in ieee802154_llsec_getparams()
Zhen Lei <thunder.leizhen(a)huawei.com>
ieee802154: fix error return code in ieee802154_add_iface()
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Zhen Lei <thunder.leizhen(a)huawei.com>
HID: pidff: fix error return code in hid_pidff_init()
Julian Anastasov <ja(a)ssi.bg>
ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Max Gurtovoy <mgurtovoy(a)nvidia.com>
vfio/platform: fix module_put call in error flow
Zhen Lei <thunder.leizhen(a)huawei.com>
vfio/pci: Fix error return code in vfio_ecap_init()
Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
efi: cper: fix snprintf() use in cper_dimm_err_location()
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/kernel/traps.c | 8 ----
arch/x86/kvm/svm.c | 8 ++--
drivers/firmware/efi/cper.c | 4 +-
drivers/hid/usbhid/hid-pidff.c | 1 +
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 2 +-
drivers/xen/xen-pciback/vpci.c | 14 ++++---
fs/btrfs/tree-log.c | 13 ++++---
fs/ext4/extents.c | 43 ++++++++++++----------
fs/ocfs2/file.c | 55 +++++++++++++++++++++++++---
include/net/caif/caif_dev.h | 2 +-
include/net/caif/cfcnfg.h | 2 +-
include/net/caif/cfserl.h | 1 +
init/main.c | 2 +-
net/bluetooth/hci_core.c | 7 +++-
net/bluetooth/hci_sock.c | 4 +-
net/caif/caif_dev.c | 13 +++++--
net/caif/caif_usb.c | 14 ++++++-
net/caif/cfcnfg.c | 16 +++++---
net/caif/cfserl.c | 5 +++
net/ieee802154/nl-mac.c | 4 +-
net/ieee802154/nl-phy.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
net/netfilter/nfnetlink_cthelper.c | 8 +++-
net/nfc/llcp_sock.c | 2 +
sound/core/timer.c | 3 +-
27 files changed, 165 insertions(+), 78 deletions(-)
This is a note to let you know that I've just added the patch titled
bus: mhi: pci-generic: Fix hibernation
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 5f0c2ee1fe8de700dd0d1cdc63e1a7338e2d3a3d Mon Sep 17 00:00:00 2001
From: Loic Poulain <loic.poulain(a)linaro.org>
Date: Sun, 6 Jun 2021 21:07:41 +0530
Subject: bus: mhi: pci-generic: Fix hibernation
This patch fixes crash after resuming from hibernation. The issue
occurs when mhi stack is builtin and so part of the 'restore-kernel',
causing the device to be resumed from 'restored kernel' with a no
more valid context (memory mappings etc...) and leading to spurious
crashes.
This patch fixes the issue by implementing proper freeze/restore
callbacks.
Link: https://lore.kernel.org/r/1622571445-4505-1-git-send-email-loic.poulain@lin…
Reported-by: Shujun Wang <wsj20369(a)163.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Signed-off-by: Loic Poulain <loic.poulain(a)linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20210606153741.20725-4-manivannan.sadhasivam@lina…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/bus/mhi/pci_generic.c | 36 ++++++++++++++++++++++++++++++++++-
1 file changed, 35 insertions(+), 1 deletion(-)
diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 0a6619ad292c..b3357a8a2fdb 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -935,9 +935,43 @@ static int __maybe_unused mhi_pci_resume(struct device *dev)
return ret;
}
+static int __maybe_unused mhi_pci_freeze(struct device *dev)
+{
+ struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
+ struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+
+ /* We want to stop all operations, hibernation does not guarantee that
+ * device will be in the same state as before freezing, especially if
+ * the intermediate restore kernel reinitializes MHI device with new
+ * context.
+ */
+ if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
+ mhi_power_down(mhi_cntrl, false);
+ mhi_unprepare_after_power_down(mhi_cntrl);
+ }
+
+ return 0;
+}
+
+static int __maybe_unused mhi_pci_restore(struct device *dev)
+{
+ struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
+
+ /* Reinitialize the device */
+ queue_work(system_long_wq, &mhi_pdev->recovery_work);
+
+ return 0;
+}
+
static const struct dev_pm_ops mhi_pci_pm_ops = {
SET_RUNTIME_PM_OPS(mhi_pci_runtime_suspend, mhi_pci_runtime_resume, NULL)
- SET_SYSTEM_SLEEP_PM_OPS(mhi_pci_suspend, mhi_pci_resume)
+#ifdef CONFIG_PM_SLEEP
+ .suspend = mhi_pci_suspend,
+ .resume = mhi_pci_resume,
+ .freeze = mhi_pci_freeze,
+ .thaw = mhi_pci_restore,
+ .restore = mhi_pci_restore,
+#endif
};
static struct pci_driver mhi_pci_driver = {
--
2.32.0
This is a note to let you know that I've just added the patch titled
bus: mhi: pci_generic: Fix possible use-after-free in
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 0b67808ade8893a1b3608ddd74fac7854786c919 Mon Sep 17 00:00:00 2001
From: Wei Yongjun <weiyongjun1(a)huawei.com>
Date: Sun, 6 Jun 2021 21:07:40 +0530
Subject: bus: mhi: pci_generic: Fix possible use-after-free in
mhi_pci_remove()
This driver's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.
Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.
Link: https://lore.kernel.org/r/20210413160318.2003699-1-weiyongjun1@huawei.com
Fixes: 8562d4fe34a3 ("mhi: pci_generic: Add health-check")
Cc: stable <stable(a)vger.kernel.org>
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Reviewed-by: Hemant kumar <hemantk(a)codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Reviewed-by: Loic Poulain <loic.poulain(a)linaro.org>
Signed-off-by: Wei Yongjun <weiyongjun1(a)huawei.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20210606153741.20725-3-manivannan.sadhasivam@lina…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/bus/mhi/pci_generic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 8c7f6576e421..0a6619ad292c 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -708,7 +708,7 @@ static void mhi_pci_remove(struct pci_dev *pdev)
struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
- del_timer(&mhi_pdev->health_check_timer);
+ del_timer_sync(&mhi_pdev->health_check_timer);
cancel_work_sync(&mhi_pdev->recovery_work);
if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
--
2.32.0
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 484cea4f362e1eeb5c869abbfb5f90eae6421b38
Gitweb: https://git.kernel.org/tip/484cea4f362e1eeb5c869abbfb5f90eae6421b38
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Tue, 08 Jun 2021 16:36:18 +02:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Wed, 09 Jun 2021 09:28:21 +02:00
x86/fpu: Prevent state corruption in __fpu__restore_sig()
The non-compacted slowpath uses __copy_from_user() and copies the entire
user buffer into the kernel buffer, verbatim. This means that the kernel
buffer may now contain entirely invalid state on which XRSTOR will #GP.
validate_user_xstate_header() can detect some of that corruption, but that
leaves the onus on callers to clear the buffer.
Prior to XSAVES support, it was possible just to reinitialize the buffer,
completely, but with supervisor states that is not longer possible as the
buffer clearing code split got it backwards. Fixing that is possible but
not corrupting the state in the first place is more robust.
Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
which validates the XSAVE header contents before copying the actual states
to the kernel. copy_user_to_xstate() was previously only called for
compacted-format kernel buffers, but it works for both compacted and
non-compacted forms.
Using it for the non-compacted form is slower because of multiple
__copy_from_user() operations, but that cost is less important than robust
code in an already slow path.
[ Changelog polished by Dave Hansen ]
Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
Reported-by: syzbot+2067e764dbcd10721e2e(a)syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rik van Riel <riel(a)surriel.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20210608144345.611833074@linutronix.de
---
arch/x86/kernel/fpu/signal.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index a4ec653..d5bc96a 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -405,14 +405,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
if (use_xsave() && !fx_only) {
u64 init_bv = xfeatures_mask_user() & ~user_xfeatures;
- if (using_compacted_format()) {
- ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
- } else {
- ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size);
-
- if (!ret && state_size > offsetof(struct xregs_state, header))
- ret = validate_user_xstate_header(&fpu->state.xsave.header);
- }
+ ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx);
if (ret)
goto err_out;
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 12f7764ac61200e32c916f038bdc08f884b0b604
Gitweb: https://git.kernel.org/tip/12f7764ac61200e32c916f038bdc08f884b0b604
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Tue, 08 Jun 2021 16:36:20 +02:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Wed, 09 Jun 2021 10:39:04 +02:00
x86/process: Check PF_KTHREAD and not current->mm for kernel threads
switch_fpu_finish() checks current->mm as indicator for kernel threads.
That's wrong because kernel threads can temporarily use a mm of a user
process via kthread_use_mm().
Check the task flags for PF_KTHREAD instead.
Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rik van Riel <riel(a)surriel.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20210608144345.912645927@linutronix.de
---
arch/x86/include/asm/fpu/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index ceeba9f..18382ac 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -578,7 +578,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
* PKRU state is switched eagerly because it needs to be valid before we
* return to userland e.g. for a copy_to_user() operation.
*/
- if (current->mm) {
+ if (!(current->flags & PF_KTHREAD)) {
pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
if (pk)
pkru_val = pk->pkru;
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: d8778e393afa421f1f117471144f8ce6deb6953a
Gitweb: https://git.kernel.org/tip/d8778e393afa421f1f117471144f8ce6deb6953a
Author: Andy Lutomirski <luto(a)kernel.org>
AuthorDate: Tue, 08 Jun 2021 16:36:19 +02:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Wed, 09 Jun 2021 09:49:38 +02:00
x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer
Both Intel and AMD consider it to be architecturally valid for XRSTOR to
fail with #PF but nonetheless change the register state. The actual
conditions under which this might occur are unclear [1], but it seems
plausible that this might be triggered if one sibling thread unmaps a page
and invalidates the shared TLB while another sibling thread is executing
XRSTOR on the page in question.
__fpu__restore_sig() can execute XRSTOR while the hardware registers
are preserved on behalf of a different victim task (using the
fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but
modify the registers.
If this happens, then there is a window in which __fpu__restore_sig()
could schedule out and the victim task could schedule back in without
reloading its own FPU registers. This would result in part of the FPU
state that __fpu__restore_sig() was attempting to load leaking into the
victim task's user-visible state.
Invalidate preserved FPU registers on XRSTOR failure to prevent this
situation from corrupting any state.
[1] Frequent readers of the errata lists might imagine "complex
microarchitectural conditions".
Fixes: 1d731e731c4c ("x86/fpu: Add a fastpath to __fpu__restore_sig()")
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rik van Riel <riel(a)surriel.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20210608144345.758116583@linutronix.de
---
arch/x86/kernel/fpu/signal.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index d5bc96a..4ab9aeb 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -369,6 +369,25 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
fpregs_unlock();
return 0;
}
+
+ /*
+ * The above did an FPU restore operation, restricted to
+ * the user portion of the registers, and failed, but the
+ * microcode might have modified the FPU registers
+ * nevertheless.
+ *
+ * If the FPU registers do not belong to current, then
+ * invalidate the FPU register state otherwise the task might
+ * preempt current and return to user space with corrupted
+ * FPU registers.
+ *
+ * In case current owns the FPU registers then no further
+ * action is required. The fixup below will handle it
+ * correctly.
+ */
+ if (test_thread_flag(TIF_NEED_FPU_LOAD))
+ __cpu_invalidate_fpregs_state();
+
fpregs_unlock();
} else {
/*
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 510b80a6a0f1a0d114c6e33bcea64747d127973c
Gitweb: https://git.kernel.org/tip/510b80a6a0f1a0d114c6e33bcea64747d127973c
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Tue, 08 Jun 2021 16:36:21 +02:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Wed, 09 Jun 2021 12:12:45 +02:00
x86/pkru: Write hardware init value to PKRU when xstate is init
When user space brings PKRU into init state, then the kernel handling is
broken:
T1 user space
xsave(state)
state.header.xfeatures &= ~XFEATURE_MASK_PKRU;
xrstor(state)
T1 -> kernel
schedule()
XSAVE(S) -> T1->xsave.header.xfeatures[PKRU] == 0
T1->flags |= TIF_NEED_FPU_LOAD;
wrpkru();
schedule()
...
pk = get_xsave_addr(&T1->fpu->state.xsave, XFEATURE_PKRU);
if (pk)
wrpkru(pk->pkru);
else
wrpkru(DEFAULT_PKRU);
Because the xfeatures bit is 0 and therefore the value in the xsave
storage is not valid, get_xsave_addr() returns NULL and switch_to()
writes the default PKRU. -> FAIL #1!
So that wrecks any copy_to/from_user() on the way back to user space
which hits memory which is protected by the default PKRU value.
Assumed that this does not fail (pure luck) then T1 goes back to user
space and because TIF_NEED_FPU_LOAD is set it ends up in
switch_fpu_return()
__fpregs_load_activate()
if (!fpregs_state_valid()) {
load_XSTATE_from_task();
}
But if nothing touched the FPU between T1 scheduling out and back in,
then the fpregs_state is still valid which means switch_fpu_return()
does nothing and just clears TIF_NEED_FPU_LOAD. Back to user space with
DEFAULT_PKRU loaded. -> FAIL #2!
The fix is simple: if get_xsave_addr() returns NULL then set the
PKRU value to 0 instead of the restrictive default PKRU value in
init_pkru_value.
[ bp: Massage in minor nitpicks from folks. ]
Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rik van Riel <riel(a)surriel.com>
Tested-by: Babu Moger <babu.moger(a)amd.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20210608144346.045616965@linutronix.de
---
arch/x86/include/asm/fpu/internal.h | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 18382ac..fdee23e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -579,9 +579,16 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
* return to userland e.g. for a copy_to_user() operation.
*/
if (!(current->flags & PF_KTHREAD)) {
+ /*
+ * If the PKRU bit in xsave.header.xfeatures is not set,
+ * then the PKRU component was in init state, which means
+ * XRSTOR will set PKRU to 0. If the bit is not set then
+ * get_xsave_addr() will return NULL because the PKRU value
+ * in memory is not valid. This means pkru_val has to be
+ * set to 0 and not to init_pkru_value.
+ */
pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
- if (pk)
- pkru_val = pk->pkru;
+ pkru_val = pk ? pk->pkru : 0;
}
__write_pkru(pkru_val);
}
From: Mayank Rana <mrana(a)codeaurora.org>
If ucsi_init() fails for some reason (e.g. ucsi_register_port()
fails or general communication failure to the PPM), particularly at
any point after the GET_CAPABILITY command had been issued, this
results in unwinding the initialization and returning an error.
However the ucsi structure's ucsi_capability member retains its
current value, including likely a non-zero num_connectors.
And because ucsi_init() itself is done in a workqueue a UCSI
interface driver will be unaware that it failed and may think the
ucsi_register() call was completely successful. Later, if
ucsi_unregister() is called, due to this stale ucsi->cap value it
would try to access the items in the ucsi->connector array which
might not be in a proper state or not even allocated at all and
results in NULL or invalid pointer dereference.
Fix this by clearing the ucsi->cap value to 0 during the error
path of ucsi_init() in order to prevent a later ucsi_unregister()
from entering the connector cleanup loop.
Fixes: c1b0bc2dabfa ("usb: typec: Add support for UCSI interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mayank Rana <mrana(a)codeaurora.org>
Signed-off-by: Jack Pham <jackp(a)codeaurora.org>
---
drivers/usb/typec/ucsi/ucsi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index b433169ef6fa..b7d104c80d85 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -1253,6 +1253,7 @@ static int ucsi_init(struct ucsi *ucsi)
}
err_reset:
+ memset(&ucsi->cap, 0, sizeof(ucsi->cap));
ucsi_reset_ppm(ucsi);
err:
return ret;
--
2.24.0
This is a note to let you know that I've just added the patch titled
usb: misc: brcmstb-usb-pinmap: check return value after calling
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From fbf649cd6d64d40c03c5397ecd6b1ae922ba7afc Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Sat, 5 Jun 2021 16:09:14 +0800
Subject: usb: misc: brcmstb-usb-pinmap: check return value after calling
platform_get_resource()
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
Fixes: 517c4c44b323 ("usb: Add driver to allow any GPIO to be used for 7211 USB signals")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Link: https://lore.kernel.org/r/20210605080914.2057758-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/misc/brcmstb-usb-pinmap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/misc/brcmstb-usb-pinmap.c b/drivers/usb/misc/brcmstb-usb-pinmap.c
index b3cfe8666ea7..336653091e3b 100644
--- a/drivers/usb/misc/brcmstb-usb-pinmap.c
+++ b/drivers/usb/misc/brcmstb-usb-pinmap.c
@@ -263,6 +263,8 @@ static int __init brcmstb_usb_pinmap_probe(struct platform_device *pdev)
return -EINVAL;
r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!r)
+ return -EINVAL;
pdata = devm_kzalloc(&pdev->dev,
sizeof(*pdata) +
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: dwc3: ep0: fix NULL pointer exception
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d00889080ab60051627dab1d85831cd9db750e2a Mon Sep 17 00:00:00 2001
From: Marian-Cristian Rotariu <marian.c.rotariu(a)gmail.com>
Date: Tue, 8 Jun 2021 19:26:50 +0300
Subject: usb: dwc3: ep0: fix NULL pointer exception
There is no validation of the index from dwc3_wIndex_to_dep() and we might
be referring a non-existing ep and trigger a NULL pointer exception. In
certain configurations we might use fewer eps and the index might wrongly
indicate a larger ep index than existing.
By adding this validation from the patch we can actually report a wrong
index back to the caller.
In our usecase we are using a composite device on an older kernel, but
upstream might use this fix also. Unfortunately, I cannot describe the
hardware for others to reproduce the issue as it is a proprietary
implementation.
[ 82.958261] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a4
[ 82.966891] Mem abort info:
[ 82.969663] ESR = 0x96000006
[ 82.972703] Exception class = DABT (current EL), IL = 32 bits
[ 82.978603] SET = 0, FnV = 0
[ 82.981642] EA = 0, S1PTW = 0
[ 82.984765] Data abort info:
[ 82.987631] ISV = 0, ISS = 0x00000006
[ 82.991449] CM = 0, WnR = 0
[ 82.994409] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c6210ccc
[ 83.000999] [00000000000000a4] pgd=0000000053aa5003, pud=0000000053aa5003, pmd=0000000000000000
[ 83.009685] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 83.026433] Process irq/62-dwc3 (pid: 303, stack limit = 0x000000003985154c)
[ 83.033470] CPU: 0 PID: 303 Comm: irq/62-dwc3 Not tainted 4.19.124 #1
[ 83.044836] pstate: 60000085 (nZCv daIf -PAN -UAO)
[ 83.049628] pc : dwc3_ep0_handle_feature+0x414/0x43c
[ 83.054558] lr : dwc3_ep0_interrupt+0x3b4/0xc94
...
[ 83.141788] Call trace:
[ 83.144227] dwc3_ep0_handle_feature+0x414/0x43c
[ 83.148823] dwc3_ep0_interrupt+0x3b4/0xc94
[ 83.181546] ---[ end trace aac6b5267d84c32f ]---
Signed-off-by: Marian-Cristian Rotariu <marian.c.rotariu(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210608162650.58426-1-marian.c.rotariu@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/ep0.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 8b668ef46f7f..3cd294264372 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -292,6 +292,9 @@ static struct dwc3_ep *dwc3_wIndex_to_dep(struct dwc3 *dwc, __le16 wIndex_le)
epnum |= 1;
dep = dwc->eps[epnum];
+ if (dep == NULL)
+ return NULL;
+
if (dep->flags & DWC3_EP_ENABLED)
return dep;
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: gadget: eem: fix wrong eem header operation
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 305f670846a31a261462577dd0b967c4fa796871 Mon Sep 17 00:00:00 2001
From: Linyu Yuan <linyyuan(a)codeaurora.com>
Date: Wed, 9 Jun 2021 07:35:47 +0800
Subject: usb: gadget: eem: fix wrong eem header operation
when skb_clone() or skb_copy_expand() fail,
it should pull skb with lengh indicated by header,
or not it will read network data and check it as header.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Linyu Yuan <linyyuan(a)codeaurora.com>
Link: https://lore.kernel.org/r/20210608233547.3767-1-linyyuan@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_eem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.c
index e6cb38439c41..2cd9942707b4 100644
--- a/drivers/usb/gadget/function/f_eem.c
+++ b/drivers/usb/gadget/function/f_eem.c
@@ -495,7 +495,7 @@ static int eem_unwrap(struct gether *port,
skb2 = skb_clone(skb, GFP_ATOMIC);
if (unlikely(!skb2)) {
DBG(cdev, "unable to unframe EEM packet\n");
- continue;
+ goto next;
}
skb_trim(skb2, len - ETH_FCS_LEN);
@@ -505,7 +505,7 @@ static int eem_unwrap(struct gether *port,
GFP_ATOMIC);
if (unlikely(!skb3)) {
dev_kfree_skb_any(skb2);
- continue;
+ goto next;
}
dev_kfree_skb_any(skb2);
skb_queue_tail(list, skb3);
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe()
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 1a85b350a7741776a406005b943e3dec02c424ed Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Date: Mon, 7 Jun 2021 23:50:05 +0300
Subject: usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe()
device_get_next_child_node() bumps a reference counting of a returned variable.
We have to balance it whenever we return to the caller.
Fixes: 6701adfa9693 ("usb: typec: driver for Intel PMC mux control")
Cc: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210607205007.71458-1-andy.shevchenko@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/mux/intel_pmc_mux.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/typec/mux/intel_pmc_mux.c b/drivers/usb/typec/mux/intel_pmc_mux.c
index 46a25b8db72e..96d8c5a04680 100644
--- a/drivers/usb/typec/mux/intel_pmc_mux.c
+++ b/drivers/usb/typec/mux/intel_pmc_mux.c
@@ -636,8 +636,10 @@ static int pmc_usb_probe(struct platform_device *pdev)
break;
ret = pmc_usb_register_port(pmc, i, fwnode);
- if (ret)
+ if (ret) {
+ fwnode_handle_put(fwnode);
goto err_remove_ports;
+ }
}
platform_set_drvdata(pdev, pmc);
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: typec: intel_pmc_mux: Add missed error check for
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 843fabdd7623271330af07f1b7fbd7fabe33c8de Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Date: Mon, 7 Jun 2021 23:50:06 +0300
Subject: usb: typec: intel_pmc_mux: Add missed error check for
devm_ioremap_resource()
devm_ioremap_resource() can return an error, add missed check for it.
Fixes: 43d596e32276 ("usb: typec: intel_pmc_mux: Check the port status before connect")
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210607205007.71458-2-andy.shevchenko@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/mux/intel_pmc_mux.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/usb/typec/mux/intel_pmc_mux.c b/drivers/usb/typec/mux/intel_pmc_mux.c
index 96d8c5a04680..31b1b3b555c7 100644
--- a/drivers/usb/typec/mux/intel_pmc_mux.c
+++ b/drivers/usb/typec/mux/intel_pmc_mux.c
@@ -586,6 +586,11 @@ static int pmc_usb_probe_iom(struct pmc_usb *pmc)
return -ENOMEM;
}
+ if (IS_ERR(pmc->iom_base)) {
+ put_device(&adev->dev);
+ return PTR_ERR(pmc->iom_base);
+ }
+
pmc->iom_adev = adev;
return 0;
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: typec: tcpm: Do not finish VDM AMS for retrying Responses
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 5ab14ab1f2db24ffae6c5c39a689660486962e6e Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Sun, 6 Jun 2021 16:14:52 +0800
Subject: usb: typec: tcpm: Do not finish VDM AMS for retrying Responses
If the VDM responses couldn't be sent successfully, it doesn't need to
finish the AMS until the retry count reaches the limit.
Fixes: 0908c5aca31e ("usb: typec: tcpm: AMS and Collision Avoidance")
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Link: https://lore.kernel.org/r/20210606081452.764032-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/tcpm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index a7c336f56849..63470cf7f4cd 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -1942,6 +1942,9 @@ static void vdm_run_state_machine(struct tcpm_port *port)
tcpm_log(port, "VDM Tx error, retry");
port->vdm_retries++;
port->vdm_state = VDM_STATE_READY;
+ if (PD_VDO_SVDM(vdo_hdr) && PD_VDO_CMDT(vdo_hdr) == CMDT_INIT)
+ tcpm_ams_finish(port);
+ } else {
tcpm_ams_finish(port);
}
break;
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: fix various gadget panics on 10gbps cabling
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 032e288097a553db5653af552dd8035cd2a0ba96 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Maciej=20=C5=BBenczykowski?= <maze(a)google.com>
Date: Tue, 8 Jun 2021 19:44:59 -0700
Subject: usb: fix various gadget panics on 10gbps cabling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
usb_assign_descriptors() is called with 5 parameters,
the last 4 of which are the usb_descriptor_header for:
full-speed (USB1.1 - 12Mbps [including USB1.0 low-speed @ 1.5Mbps),
high-speed (USB2.0 - 480Mbps),
super-speed (USB3.0 - 5Gbps),
super-speed-plus (USB3.1 - 10Gbps).
The differences between full/high/super-speed descriptors are usually
substantial (due to changes in the maximum usb block size from 64 to 512
to 1024 bytes and other differences in the specs), while the difference
between 5 and 10Gbps descriptors may be as little as nothing
(in many cases the same tuning is simply good enough).
However if a gadget driver calls usb_assign_descriptors() with
a NULL descriptor for super-speed-plus and is then used on a max 10gbps
configuration, the kernel will crash with a null pointer dereference,
when a 10gbps capable device port + cable + host port combination shows up.
(This wouldn't happen if the gadget max-speed was set to 5gbps, but
it of course defaults to the maximum, and there's no real reason to
artificially limit it)
The fix is to simply use the 5gbps descriptor as the 10gbps descriptor,
if a 10gbps descriptor wasn't provided.
Obviously this won't fix the problem if the 5gbps descriptor is also
NULL, but such cases can't be so trivially solved (and any such gadgets
are unlikely to be used with USB3 ports any way).
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Maciej Żenczykowski <maze(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210609024459.1126080-1-zenczykowski@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/config.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/usb/gadget/config.c b/drivers/usb/gadget/config.c
index 8bb25773b61e..05507606b2b4 100644
--- a/drivers/usb/gadget/config.c
+++ b/drivers/usb/gadget/config.c
@@ -164,6 +164,14 @@ int usb_assign_descriptors(struct usb_function *f,
{
struct usb_gadget *g = f->config->cdev->gadget;
+ /* super-speed-plus descriptor falls back to super-speed one,
+ * if such a descriptor was provided, thus avoiding a NULL
+ * pointer dereference if a 5gbps capable gadget is used with
+ * a 10gbps capable config (device port + cable + host port)
+ */
+ if (!ssp)
+ ssp = ss;
+
if (fs) {
f->fs_descriptors = usb_copy_descriptors(fs);
if (!f->fs_descriptors)
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: pci-quirks: disable D3cold on xhci suspend for s2idle on AMD
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d1658268e43980c071dbffc3d894f6f6c4b6732a Mon Sep 17 00:00:00 2001
From: Mario Limonciello <mario.limonciello(a)amd.com>
Date: Thu, 27 May 2021 10:45:34 -0500
Subject: usb: pci-quirks: disable D3cold on xhci suspend for s2idle on AMD
Renoir
The XHCI controller is required to enter D3hot rather than D3cold for AMD
s2idle on this hardware generation.
Otherwise, the 'Controller Not Ready' (CNR) bit is not being cleared by
host in resume and eventually this results in xhci resume failures during
the s2idle wakeup.
Link: https://lore.kernel.org/linux-usb/1612527609-7053-1-git-send-email-Prike.Li…
Suggested-by: Prike Liang <Prike.Liang(a)amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: stable <stable(a)vger.kernel.org> # 5.11+
Link: https://lore.kernel.org/r/20210527154534.8900-1-mario.limonciello@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci-pci.c | 7 ++++++-
drivers/usb/host/xhci.h | 1 +
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 7bc18cf8042c..18c2bbddf080 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -59,6 +59,7 @@
#define PCI_DEVICE_ID_INTEL_MAPLE_RIDGE_XHCI 0x1138
#define PCI_DEVICE_ID_INTEL_ALDER_LAKE_XHCI 0x461e
+#define PCI_DEVICE_ID_AMD_RENOIR_XHCI 0x1639
#define PCI_DEVICE_ID_AMD_PROMONTORYA_4 0x43b9
#define PCI_DEVICE_ID_AMD_PROMONTORYA_3 0x43ba
#define PCI_DEVICE_ID_AMD_PROMONTORYA_2 0x43bb
@@ -182,6 +183,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
(pdev->device == PCI_DEVICE_ID_AMD_PROMONTORYA_1)))
xhci->quirks |= XHCI_U2_DISABLE_WAKE;
+ if (pdev->vendor == PCI_VENDOR_ID_AMD &&
+ pdev->device == PCI_DEVICE_ID_AMD_RENOIR_XHCI)
+ xhci->quirks |= XHCI_BROKEN_D3COLD;
+
if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
xhci->quirks |= XHCI_LPM_SUPPORT;
xhci->quirks |= XHCI_INTEL_HOST;
@@ -539,7 +544,7 @@ static int xhci_pci_suspend(struct usb_hcd *hcd, bool do_wakeup)
* Systems with the TI redriver that loses port status change events
* need to have the registers polled during D3, so avoid D3cold.
*/
- if (xhci->quirks & XHCI_COMP_MODE_QUIRK)
+ if (xhci->quirks & (XHCI_COMP_MODE_QUIRK | XHCI_BROKEN_D3COLD))
pci_d3cold_disable(pdev);
if (xhci->quirks & XHCI_PME_STUCK_QUIRK)
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 2595a8f057c4..e417f5ce13d1 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1892,6 +1892,7 @@ struct xhci_hcd {
#define XHCI_DISABLE_SPARSE BIT_ULL(38)
#define XHCI_SG_TRB_CACHE_SIZE_QUIRK BIT_ULL(39)
#define XHCI_NO_SOFT_RETRY BIT_ULL(40)
+#define XHCI_BROKEN_D3COLD BIT_ULL(41)
unsigned int num_active_eps;
unsigned int limit_active_eps;
--
2.32.0
This is a note to let you know that I've just added the patch titled
usb: f_ncm: only first packet of aggregate needs to start timer
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 1958ff5ad2d4908b44a72bcf564dfe67c981e7fe Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Maciej=20=C5=BBenczykowski?= <maze(a)google.com>
Date: Tue, 8 Jun 2021 01:54:38 -0700
Subject: usb: f_ncm: only first packet of aggregate needs to start timer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The reasoning for this change is that if we already had
a packet pending, then we also already had a pending timer,
and as such there is no need to reschedule it.
This also prevents packets getting delayed 60 ms worst case
under a tiny packet every 290us transmit load, by keeping the
timeout always relative to the first queued up packet.
(300us delay * 16KB max aggregation / 80 byte packet =~ 60 ms)
As such the first packet is now at most delayed by 300us.
Under low transmit load, this will simply result in us sending
a shorter aggregate, as originally intended.
This patch has the benefit of greatly reducing (by ~10 factor
with 1500 byte frames aggregated into 16 kiB) the number of
(potentially pretty costly) updates to the hrtimer.
Cc: Brooke Basile <brookebasile(a)gmail.com>
Cc: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Signed-off-by: Maciej Żenczykowski <maze(a)google.com>
Link: https://lore.kernel.org/r/20210608085438.813960-1-zenczykowski@gmail.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_ncm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
index 0d23c6c11a13..855127249f24 100644
--- a/drivers/usb/gadget/function/f_ncm.c
+++ b/drivers/usb/gadget/function/f_ncm.c
@@ -1101,11 +1101,11 @@ static struct sk_buff *ncm_wrap_ntb(struct gether *port,
ncm->ndp_dgram_count = 1;
/* Note: we skip opts->next_ndp_index */
- }
- /* Delay the timer. */
- hrtimer_start(&ncm->task_timer, TX_TIMEOUT_NSECS,
- HRTIMER_MODE_REL_SOFT);
+ /* Start the timer. */
+ hrtimer_start(&ncm->task_timer, TX_TIMEOUT_NSECS,
+ HRTIMER_MODE_REL_SOFT);
+ }
/* Add the datagram position entries */
ntb_ndp = skb_put_zero(ncm->skb_tx_ndp, dgram_idx_len);
--
2.32.0
This is a note to let you know that I've just added the patch titled
USB: f_ncm: ncm_bitrate (speed) is unsigned
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 3370139745853f7826895293e8ac3aec1430508e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Maciej=20=C5=BBenczykowski?= <maze(a)google.com>
Date: Mon, 7 Jun 2021 17:53:44 -0700
Subject: USB: f_ncm: ncm_bitrate (speed) is unsigned
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
[ 190.544755] configfs-gadget gadget: notify speed -44967296
This is because 4250000000 - 2**32 is -44967296.
Fixes: 9f6ce4240a2b ("usb: gadget: f_ncm.c added")
Cc: Brooke Basile <brookebasile(a)gmail.com>
Cc: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta(a)nokia.com>
Cc: Linux USB Mailing List <linux-usb(a)vger.kernel.org>
Acked-By: Lorenzo Colitti <lorenzo(a)google.com>
Signed-off-by: Maciej Żenczykowski <maze(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210608005344.3762668-1-zenczykowski@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_ncm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
index 019bea8e09cc..0d23c6c11a13 100644
--- a/drivers/usb/gadget/function/f_ncm.c
+++ b/drivers/usb/gadget/function/f_ncm.c
@@ -583,7 +583,7 @@ static void ncm_do_notify(struct f_ncm *ncm)
data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
data[1] = data[0];
- DBG(cdev, "notify speed %d\n", ncm_bitrate(cdev->gadget));
+ DBG(cdev, "notify speed %u\n", ncm_bitrate(cdev->gadget));
ncm->notify_state = NCM_NOTIFY_CONNECT;
break;
}
--
2.32.0
On Wed, Jun 09, 2021 at 03:51:20PM +0800, Xuan Zhuo wrote:
> On Wed, 9 Jun 2021 08:24:20 +0200, Greg KH <gregkh(a)linuxfoundation.org> wrote:
> > On Wed, Jun 09, 2021 at 02:08:17PM +0800, Xuan Zhuo wrote:
> > > On Wed, 9 Jun 2021 06:50:10 +0200, Greg KH <gregkh(a)linuxfoundation.org> wrote:
> > > > On Wed, Jun 09, 2021 at 09:48:33AM +0800, Xuan Zhuo wrote:
> > > > > > > With this patch and the latest net branch I no longer get crashes.
> > > > > >
> > > > > > Did this ever get properly submitted to the networking tree to get into
> > > > > > 5.13-final?
> > > > >
> > > > > The patch has been submitted.
> > > > >
> > > > > [PATCH net] virtio-net: fix for skb_over_panic inside big mode
> > > >
> > > > Submitted where? Do you have a lore.kernel.org link somewhere?
> > >
> > >
> > > https://lore.kernel.org/netdev/20210603170901.66504-1-xuanzhuo@linux.alibab…
> >
> > So this is commit 1a8024239dac ("virtio-net: fix for skb_over_panic
> > inside big mode") in Linus's tree, right?
>
> YES.
>
> >
> > But why is that referencing:
> > Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
>
> This problem was indeed introduced in fb32856b16ad.
>
> I confirmed that this commit fb32856b16ad was first entered in 5.13-rc1, and the
> previous 5.12 did not have this commit fb32856b16ad.
>
> I'm not sure if it helped you.
Hm, then what resolves the reported problem that people were having with
the 5.12.y kernel release? Is that a separate issue?
thanks,
greg k-h
On Wed, Jun 09, 2021 at 02:08:17PM +0800, Xuan Zhuo wrote:
> On Wed, 9 Jun 2021 06:50:10 +0200, Greg KH <gregkh(a)linuxfoundation.org> wrote:
> > On Wed, Jun 09, 2021 at 09:48:33AM +0800, Xuan Zhuo wrote:
> > > > > With this patch and the latest net branch I no longer get crashes.
> > > >
> > > > Did this ever get properly submitted to the networking tree to get into
> > > > 5.13-final?
> > >
> > > The patch has been submitted.
> > >
> > > [PATCH net] virtio-net: fix for skb_over_panic inside big mode
> >
> > Submitted where? Do you have a lore.kernel.org link somewhere?
>
>
> https://lore.kernel.org/netdev/20210603170901.66504-1-xuanzhuo@linux.alibab…
So this is commit 1a8024239dac ("virtio-net: fix for skb_over_panic
inside big mode") in Linus's tree, right?
But why is that referencing:
Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
when this problem was seen in stable kernels that had a different commit
backported to it?
Is there nothing needed to be done for the stable kernel trees?
confused,
greg k-h
On Wed, Jun 09, 2021 at 09:48:33AM +0800, Xuan Zhuo wrote:
> > > With this patch and the latest net branch I no longer get crashes.
> >
> > Did this ever get properly submitted to the networking tree to get into
> > 5.13-final?
>
> The patch has been submitted.
>
> [PATCH net] virtio-net: fix for skb_over_panic inside big mode
Submitted where? Do you have a lore.kernel.org link somewhere?
thanks,
greg k-h
The patch titled
Subject: mm, thp: use head page in __migration_entry_wait
has been added to the -mm tree. Its filename is
mm-thp-use-head-page-in-__migration_entry_wait.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-use-head-page-in-__migrati…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-use-head-page-in-__migrati…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Xu Yu <xuyu(a)linux.alibaba.com>
Subject: mm, thp: use head page in __migration_entry_wait
We notice that hung task happens in a corner but practical scenario when
CONFIG_PREEMPT_NONE is enabled, as follows.
Process 0 Process 1 Process 2..Inf
split_huge_page_to_list
unmap_page
split_huge_pmd_address
__migration_entry_wait(head)
__migration_entry_wait(tail)
remap_page (roll back)
remove_migration_ptes
rmap_walk_anon
cond_resched
Where __migration_entry_wait(tail) is occurred in kernel space, e.g.,
copy_to_user in fstat, which will immediately fault again without
rescheduling, and thus occupy the cpu fully.
When there are too many processes performing __migration_entry_wait on
tail page, remap_page will never be done after cond_resched.
This makes __migration_entry_wait operate on the compound head page, thus
waits for remap_page to complete, whether the THP is split successfully or
roll back.
Note that put_and_wait_on_page_locked helps to drop the page reference
acquired with get_page_unless_zero, as soon as the page is on the wait
queue, before actually waiting. So splitting the THP is only prevented
for a brief interval.
Link: https://lkml.kernel.org/r/b9836c1dd522e903891760af9f0c86a2cce987eb.16231440…
Fixes: ba98828088ad ("thp: add option to setup migration entries during PMD split")
Suggested-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Gang Deng <gavin.dg(a)linux.alibaba.com>
Signed-off-by: Xu Yu <xuyu(a)linux.alibaba.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/migrate.c~mm-thp-use-head-page-in-__migration_entry_wait
+++ a/mm/migrate.c
@@ -295,6 +295,7 @@ void __migration_entry_wait(struct mm_st
goto out;
page = migration_entry_to_page(entry);
+ page = compound_head(page);
/*
* Once page cache replacement of page migration started, page_count
_
Patches currently in -mm which might be from xuyu(a)linux.alibaba.com are
mm-thp-use-head-page-in-__migration_entry_wait.patch
It turns out that SLUB redzoning ("slub_debug=Z") checks from
s->object_size rather than from s->inuse (which is normally bumped
to make room for the freelist pointer), so a cache created with an
object size less than 24 would have the freelist pointer written beyond
s->object_size, causing the redzone to be corrupted by the freelist
pointer. This was very visible with "slub_debug=ZF":
BUG test (Tainted: G B ): Right Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........
Object (____ptrval____): 00 00 00 00 00 f6 f4 a5 ........
Redzone (____ptrval____): 40 1d e8 1a aa @....
Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
Adjust the offset to stay within s->object_size.
(Note that no caches of in this size range are known to exist in the
kernel currently.)
Reported-by: Marco Elver <elver(a)google.com>
Reported-by: "Lin, Zhenpeng" <zplin(a)psu.edu>
Link: https://lore.kernel.org/linux-mm/20200807160627.GA1420741@elver.google.com/
Fixes: 89b83f282d8b (slub: avoid redzone when choosing freepointer location)
Cc: stable(a)vger.kernel.org
Tested-by: Marco Elver <elver(a)google.com>
Link: https://lore.kernel.org/lkml/CANpmjNOwZ5VpKQn+SYWovTkFB4VsT-RPwyENBmaK0dLcp…
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Link: https://lore.kernel.org/lkml/0f7dd7b2-7496-5e2d-9488-2ec9f8e90441@suse.cz/
---
mm/slub.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index f58cfd456548..fe30df460fad 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3689,7 +3689,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
{
slab_flags_t flags = s->flags;
unsigned int size = s->object_size;
- unsigned int freepointer_area;
unsigned int order;
/*
@@ -3698,13 +3697,6 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
* the possible location of the free pointer.
*/
size = ALIGN(size, sizeof(void *));
- /*
- * This is the area of the object where a freepointer can be
- * safely written. If redzoning adds more to the inuse size, we
- * can't use that portion for writing the freepointer, so
- * s->offset must be limited within this for the general case.
- */
- freepointer_area = size;
#ifdef CONFIG_SLUB_DEBUG
/*
@@ -3730,7 +3722,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
/*
* With that we have determined the number of bytes in actual use
- * by the object. This is the potential offset to the free pointer.
+ * by the object and redzoning.
*/
s->inuse = size;
@@ -3753,13 +3745,13 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
*/
s->offset = size;
size += sizeof(void *);
- } else if (freepointer_area > sizeof(void *)) {
+ } else {
/*
* Store freelist pointer near middle of object to keep
* it away from the edges of the object to avoid small
* sized over/underflows from neighboring allocations.
*/
- s->offset = ALIGN(freepointer_area / 2, sizeof(void *));
+ s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));
}
#ifdef CONFIG_SLUB_DEBUG
--
2.25.1
The patch titled
Subject: crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo
has been added to the -mm tree. Its filename is
crash_core-vmcoreinfo-append-section_size_bits-to-vmcoreinfo.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/crash_core-vmcoreinfo-append-sect…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/crash_core-vmcoreinfo-append-sect…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Pingfan Liu <kernelfans(a)gmail.com>
Subject: crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo
As mentioned in kernel commit 1d50e5d0c505 ("crash_core, vmcoreinfo:
Append 'MAX_PHYSMEM_BITS' to vmcoreinfo"), SECTION_SIZE_BITS in the
formula:
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
Besides SECTIONS_SHIFT, SECTION_SIZE_BITS is also used to calculate
PAGES_PER_SECTION in makedumpfile just like kernel.
Unfortunately, this arch-dependent macro SECTION_SIZE_BITS changes, e.g.
recently in kernel commit f0b13ee23241 ("arm64/sparsemem: reduce
SECTION_SIZE_BITS"). But user space wants a stable interface to get this
info. Such info is impossible to be deduced from a crashdump vmcore.
Hence append SECTION_SIZE_BITS to vmcoreinfo.
Link: https://lkml.kernel.org/r/20210608103359.84907-1-kernelfans@gmail.com
Link: http://lists.infradead.org/pipermail/kexec/2021-June/022676.html
Signed-off-by: Pingfan Liu <kernelfans(a)gmail.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Bhupesh Sharma <bhupesh.sharma(a)linaro.org>
Cc: Kazuhito Hagio <k-hagio(a)ab.jp.nec.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Boris Petkov <bp(a)alien8.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: James Morse <james.morse(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Paul Mackerras <paulus(a)samba.org>
Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Cc: Dave Anderson <anderson(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/crash_core.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/crash_core.c~crash_core-vmcoreinfo-append-section_size_bits-to-vmcoreinfo
+++ a/kernel/crash_core.c
@@ -464,6 +464,7 @@ static int __init crash_save_vmcoreinfo_
VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
VMCOREINFO_STRUCT_SIZE(mem_section);
VMCOREINFO_OFFSET(mem_section, section_mem_map);
+ VMCOREINFO_NUMBER(SECTION_SIZE_BITS);
VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
#endif
VMCOREINFO_STRUCT_SIZE(page);
_
Patches currently in -mm which might be from kernelfans(a)gmail.com are
crash_core-vmcoreinfo-append-section_size_bits-to-vmcoreinfo.patch
Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
tried to fix a problem that a process could be sleeping in rq_qos_wait()
without anyone to wake it up. However the fix is not complete and the
following can still happen:
CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker)
rq_qos_wait() rq_qos_wait()
acquire_inflight_cb() -> fails
acquire_inflight_cb() -> fails
completes IOs, inflight
decreased
prepare_to_wait_exclusive()
prepare_to_wait_exclusive()
has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
has_sleeper = !wq_has_single_sleeper() -> true
io_schedule() io_schedule()
Deadlock as now there's nobody to wakeup the two waiters. The logic
automatically blocking when there are already sleepers is really subtle
and the only way to make it work reliably is that we check whether there
are some waiters in the queue when adding ourselves there. That way, we
are guaranteed that at least the first process to enter the wait queue
will recheck the waiting condition before going to sleep and thus
guarantee forward progress.
Fixes: 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
block/blk-rq-qos.c | 4 ++--
include/linux/wait.h | 2 +-
kernel/sched/wait.c | 9 +++++++--
3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 656460636ad3..e83af7bc7591 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -266,8 +266,8 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
if (!has_sleeper && acquire_inflight_cb(rqw, private_data))
return;
- prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
- has_sleeper = !wq_has_single_sleeper(&rqw->wait);
+ has_sleeper = !prepare_to_wait_exclusive(&rqw->wait, &data.wq,
+ TASK_UNINTERRUPTIBLE);
do {
/* The memory barrier in set_task_state saves us here. */
if (data.got_token)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index fe10e8570a52..6598ae35e1b5 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -1136,7 +1136,7 @@ do { \
* Waitqueues which are removed from the waitqueue_head at wakeup time
*/
void prepare_to_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state);
-void prepare_to_wait_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state);
+bool prepare_to_wait_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state);
long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state);
void finish_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry);
long wait_woken(struct wait_queue_entry *wq_entry, unsigned mode, long timeout);
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 183cc6ae68a6..76577d1642a5 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -264,17 +264,22 @@ prepare_to_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_ent
}
EXPORT_SYMBOL(prepare_to_wait);
-void
+/* Returns true if we are the first waiter in the queue, false otherwise. */
+bool
prepare_to_wait_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state)
{
unsigned long flags;
+ bool was_empty = false;
wq_entry->flags |= WQ_FLAG_EXCLUSIVE;
spin_lock_irqsave(&wq_head->lock, flags);
- if (list_empty(&wq_entry->entry))
+ if (list_empty(&wq_entry->entry)) {
+ was_empty = list_empty(&wq_head->head);
__add_wait_queue_entry_tail(wq_head, wq_entry);
+ }
set_current_state(state);
spin_unlock_irqrestore(&wq_head->lock, flags);
+ return was_empty;
}
EXPORT_SYMBOL(prepare_to_wait_exclusive);
--
2.26.2
From: Long Li <longli(a)microsoft.com>
After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec can
have multiple pages. But bio_will_gap() still assumes one page bvec while
checking for merging. If the pages in the bvec go across the
seg_boundary_mask, this check for merging can potentially succeed if only
the 1st page is tested, and can fail if all the pages are tested.
Later, when SCSI builds the SG list the same check for merging is done in
__blk_segment_map_sg_merge() with all the pages in the bvec tested. This
time the check may fail if the pages in bvec go across the
seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those
BIOs were merged). If this check fails, we end up with a broken SG list
for drivers assuming the SG list not having offsets in intermediate pages.
This results in incorrect pages written to the disk.
Fix this by returning the multi-page bvec when testing gaps for merging.
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Jeffle Xu <jefflexu(a)linux.alibaba.com>
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Fixes: 07173c3ec276 ("block: enable multipage bvecs")
Signed-off-by: Long Li <longli(a)microsoft.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
---
Changes
v2: added commit details on how data corruption happens
v3: reorganized the code/comments in bio_get_last_bvec
include/linux/bio.h | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a0b4cfdf62a4..d2b98efb5cc5 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -44,9 +44,6 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs)
#define bio_offset(bio) bio_iter_offset((bio), (bio)->bi_iter)
#define bio_iovec(bio) bio_iter_iovec((bio), (bio)->bi_iter)
-#define bio_multiple_segments(bio) \
- ((bio)->bi_iter.bi_size != bio_iovec(bio).bv_len)
-
#define bvec_iter_sectors(iter) ((iter).bi_size >> 9)
#define bvec_iter_end_sector(iter) ((iter).bi_sector + bvec_iter_sectors((iter)))
@@ -271,7 +268,7 @@ static inline void bio_clear_flag(struct bio *bio, unsigned int bit)
static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
{
- *bv = bio_iovec(bio);
+ *bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
}
static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
@@ -279,10 +276,9 @@ static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
struct bvec_iter iter = bio->bi_iter;
int idx;
- if (unlikely(!bio_multiple_segments(bio))) {
- *bv = bio_iovec(bio);
- return;
- }
+ bio_get_first_bvec(bio, bv);
+ if (bv->bv_len == bio->bi_iter.bi_size)
+ return; /* this bio only has a single bvec */
bio_advance_iter(bio, &iter, iter.bi_size);
--
2.17.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From afd09b617db3786b6ef3dc43e28fe728cfea84df Mon Sep 17 00:00:00 2001
From: Alexey Makhalov <amakhalov(a)vmware.com>
Date: Fri, 21 May 2021 07:55:33 +0000
Subject: [PATCH] ext4: fix memory leak in ext4_fill_super
Buffer head references must be released before calling kill_bdev();
otherwise the buffer head (and its page referenced by b_data) will not
be freed by kill_bdev, and subsequently that bh will be leaked.
If blocksizes differ, sb_set_blocksize() will kill current buffers and
page cache by using kill_bdev(). And then super block will be reread
again but using correct blocksize this time. sb_set_blocksize() didn't
fully free superblock page and buffer head, and being busy, they were
not freed and instead leaked.
This can easily be reproduced by calling an infinite loop of:
systemctl start <ext4_on_lvm>.mount, and
systemctl stop <ext4_on_lvm>.mount
... since systemd creates a cgroup for each slice which it mounts, and
the bh leak get amplified by a dying memory cgroup that also never
gets freed, and memory consumption is much more easily noticed.
Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3")
Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com
Signed-off-by: Alexey Makhalov <amakhalov(a)vmware.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 886e0d518668..f66c7301b53a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4462,14 +4462,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
}
if (sb->s_blocksize != blocksize) {
+ /*
+ * bh must be released before kill_bdev(), otherwise
+ * it won't be freed and its page also. kill_bdev()
+ * is called by sb_set_blocksize().
+ */
+ brelse(bh);
/* Validate the filesystem blocksize */
if (!sb_set_blocksize(sb, blocksize)) {
ext4_msg(sb, KERN_ERR, "bad block size %d",
blocksize);
+ bh = NULL;
goto failed_mount;
}
- brelse(bh);
logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
offset = do_div(logical_sb_block, blocksize);
bh = ext4_sb_bread_unmovable(sb, logical_sb_block);
@@ -5202,8 +5208,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
kfree(get_qf_name(sb, sbi, i));
#endif
fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy);
- ext4_blkdev_remove(sbi);
+ /* ext4_blkdev_remove() calls kill_bdev(), release bh before it. */
brelse(bh);
+ ext4_blkdev_remove(sbi);
out_fail:
sb->s_fs_info = NULL;
kfree(sbi->s_blockgroup_lock);