April 2024 - Linux-stable-mirror

+ userfaultfd-change-src_folio-after-ensuring-its-unpinned-in-uffdio_move.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE has been added to the -mm mm-hotfixes-unstable branch. Its filename is userfaultfd-change-src_folio-after-ensuring-its-unpinned-in-uffdio_move.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Lokesh Gidra <lokeshgidra(a)google.com> Subject: userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE Date: Thu, 4 Apr 2024 10:17:26 -0700 Commit d7a08838ab74 ("mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails") moved the src_folio->{mapping, index} changing to after clearing the page-table and ensuring that it's not pinned. This avoids failure of swapout+migration and possibly memory corruption. However, the commit missed fixing it in the huge-page case. Link: https://lkml.kernel.org/r/20240404171726.2302435-1-lokeshgidra@google.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Lokesh Gidra <lokeshgidra(a)google.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Lokesh Gidra <lokeshgidra(a)google.com> Cc: Nicolas Geoffray <ngeoffray(a)google.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Qi Zheng <zhengqi.arch(a)bytedance.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/huge_memory.c~userfaultfd-change-src_folio-after-ensuring-its-unpinned-in-uffdio_move +++ a/mm/huge_memory.c @@ -2259,9 +2259,6 @@ int move_pages_huge_pmd(struct mm_struct goto unlock_ptls; } - folio_move_anon_rmap(src_folio, dst_vma); - WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); - src_pmdval = pmdp_huge_clear_flush(src_vma, src_addr, src_pmd); /* Folio got pinned from under us. Put it back and fail the move. */ if (folio_maybe_dma_pinned(src_folio)) { @@ -2270,6 +2267,9 @@ int move_pages_huge_pmd(struct mm_struct goto unlock_ptls; } + folio_move_anon_rmap(src_folio, dst_vma); + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); + _dst_pmd = mk_huge_pmd(&src_folio->page, dst_vma->vm_page_prot); /* Follow mremap() behavior and treat the entry dirty after the move */ _dst_pmd = pmd_mkwrite(pmd_mkdirty(_dst_pmd), dst_vma); _ Patches currently in -mm which might be from lokeshgidra(a)google.com are userfaultfd-change-src_folio-after-ensuring-its-unpinned-in-uffdio_move.patch

1 year, 8 months

1
0
0 0

Linux 6.6.25

by Greg Kroah-Hartman

I'm announcing the release of the 6.6.25 kernel. All users of the 6.6 kernel series must upgrade. The updated 6.6.y git tree can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.6.y and can be browsed at the normal kernel.org git web browser: https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary thanks, greg k-h ------------ Makefile | 2 include/linux/workqueue.h | 35 -- kernel/workqueue.c | 757 +++++++--------------------------------------- 3 files changed, 131 insertions(+), 663 deletions(-) Greg Kroah-Hartman (12): Revert "workqueue: Shorten events_freezable_power_efficient name" Revert "workqueue: Don't call cpumask_test_cpu() with -1 CPU in wq_update_node_max_active()" Revert "workqueue: Implement system-wide nr_active enforcement for unbound workqueues" Revert "workqueue: Introduce struct wq_node_nr_active" Revert "workqueue: RCU protect wq->dfl_pwq and implement accessors for it" Revert "workqueue: Make wq_adjust_max_active() round-robin pwqs while activating" Revert "workqueue: Move nr_active handling into helpers" Revert "workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work()" Revert "workqueue: Factor out pwq_is_empty()" Revert "workqueue: Move pwq->max_active to wq->max_active" Revert "workqueue.c: Increase workqueue name length" Linux 6.6.25

1 year, 8 months

1
1
0 0

[PATCH v6] media: uvcvideo: Add quirk for Logitech Rally Bar

by Ricardo Ribalda

Logitech Rally Bar devices, despite behaving as UVC cameras, have a different power management system that the other cameras from Logitech. USB_QUIRK_RESET_RESUME is applied to all the UVC cameras from Logitech at the usb core. Unfortunately, USB_QUIRK_RESET_RESUME causes undesired USB disconnects in the Rally Bar that make them completely unusable. There is an open discussion about if we should fix this in the core or add a quirk in the UVC driver. In order to enable this hardware, let's land this patch first, and we can revert it later if there is a different conclusion. Fixes: e387ef5c47dd ("usb: Add USB_QUIRK_RESET_RESUME for all Logitech UVC webcams") Cc: <stable(a)vger.kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Alan Stern <stern(a)rowland.harvard.edu> Cc: Oliver Neukum <oneukum(a)suse.com> Acked-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Reviewed-by: Devinder Khroad <dkhroad(a)logitech.com> Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com> Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org> --- Tested with a Rallybar Mini with an Acer Chromebook Spin 513 --- Changes in v6: Thanks Laurent - Fix subject line. - Move quirk before device init message. - Link to v5: https://lore.kernel.org/r/20240402-rallybar-v5-1-7bdd0fbc51f7@chromium.org Changes in v5: - Update commit message to describe that this is a temp solution. - Link to v4: https://lore.kernel.org/r/20240108-rallybar-v4-1-a7450641e41b@chromium.org Changes in v4: - Include Logi Rally Bar Huddle (Thanks Kyle!) - Link to v3: https://lore.kernel.org/r/20240102-rallybar-v3-1-0ab197ce4aa2@chromium.org Changes in v3: - Move quirk to uvc driver - Link to v2: https://lore.kernel.org/r/20231222-rallybar-v2-1-5849d62a9514@chromium.org Changes in v2: - Add Fixes tag - Add UVC maintainer as Cc - Link to v1: https://lore.kernel.org/r/20231222-rallybar-v1-1-82b2a4d3106f@chromium.org --- drivers/media/usb/uvc/uvc_driver.c | 31 +++++++++++++++++++++++++++++++ drivers/media/usb/uvc/uvcvideo.h | 1 + 2 files changed, 32 insertions(+) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index 08fcd2ffa727b..1b4fb9f46bc83 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -14,6 +14,7 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/usb.h> +#include <linux/usb/quirks.h> #include <linux/usb/uvc.h> #include <linux/videodev2.h> #include <linux/vmalloc.h> @@ -2232,6 +2233,9 @@ static int uvc_probe(struct usb_interface *intf, goto error; } + if (dev->quirks & UVC_QUIRK_FORCE_RESUME) + udev->quirks &= ~USB_QUIRK_RESET_RESUME; + uvc_dbg(dev, PROBE, "UVC device initialized\n"); usb_enable_autosuspend(udev); return 0; @@ -2574,6 +2578,33 @@ static const struct usb_device_id uvc_ids[] = { .bInterfaceSubClass = 1, .bInterfaceProtocol = 0, .driver_info = UVC_INFO_QUIRK(UVC_QUIRK_RESTORE_CTRLS_ON_INIT) }, + /* Logitech Rally Bar Huddle */ + { .match_flags = USB_DEVICE_ID_MATCH_DEVICE + | USB_DEVICE_ID_MATCH_INT_INFO, + .idVendor = 0x046d, + .idProduct = 0x087c, + .bInterfaceClass = USB_CLASS_VIDEO, + .bInterfaceSubClass = 1, + .bInterfaceProtocol = 0, + .driver_info = UVC_INFO_QUIRK(UVC_QUIRK_FORCE_RESUME) }, + /* Logitech Rally Bar */ + { .match_flags = USB_DEVICE_ID_MATCH_DEVICE + | USB_DEVICE_ID_MATCH_INT_INFO, + .idVendor = 0x046d, + .idProduct = 0x089b, + .bInterfaceClass = USB_CLASS_VIDEO, + .bInterfaceSubClass = 1, + .bInterfaceProtocol = 0, + .driver_info = UVC_INFO_QUIRK(UVC_QUIRK_FORCE_RESUME) }, + /* Logitech Rally Bar Mini */ + { .match_flags = USB_DEVICE_ID_MATCH_DEVICE + | USB_DEVICE_ID_MATCH_INT_INFO, + .idVendor = 0x046d, + .idProduct = 0x08d3, + .bInterfaceClass = USB_CLASS_VIDEO, + .bInterfaceSubClass = 1, + .bInterfaceProtocol = 0, + .driver_info = UVC_INFO_QUIRK(UVC_QUIRK_FORCE_RESUME) }, /* Chicony CNF7129 (Asus EEE 100HE) */ { .match_flags = USB_DEVICE_ID_MATCH_DEVICE | USB_DEVICE_ID_MATCH_INT_INFO, diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index 6fb0a78b1b009..fa59a21d2a289 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -73,6 +73,7 @@ #define UVC_QUIRK_FORCE_Y8 0x00000800 #define UVC_QUIRK_FORCE_BPP 0x00001000 #define UVC_QUIRK_WAKE_AUTOSUSPEND 0x00002000 +#define UVC_QUIRK_FORCE_RESUME 0x00004000 /* Format flags */ #define UVC_FMT_FLAG_COMPRESSED 0x00000001 --- base-commit: c0f65a7c112b3cfa691cead54bcf24d6cc2182b5 change-id: 20231222-rallybar-19ce0c64d5e6 Best regards, -- Ricardo Ribalda <ribalda(a)chromium.org>

1 year, 8 months

1
0
0 0

Re: Kernel Trace in recent 6.1.8n kernels

by Randy Dunlap

[+ stable & scsi] On 4/3/24 3:49 PM, Tim Tassonis wrote: > Hi all > > Maybe this is the wrong list, as it probably only affects the 6.1.8n LTS kernel releases. > > > I noticed that since 6.1.80 or so, all my boxes print a trace when rebooting or halting, right at the end. It starts with drivers/scsi/scsi_lib.c > > As everything seems already done by then, there is no "real" problem occuring, but maybe someone knows why this suddenly started to happen. > > > With qemu and the serial options, I managed to get the actual trace in text: > > Unmounting all other currently mounted file systems...[ 58.632670] EXT4-fs (sda1): re-mounted. Quota mode: none. > * [ OK ] > [ 58.684029] EXT4-fs (sda1): re-mounted. Quota mode: none. > * Bringing down the loopback interface... [ OK ] > [ 58.809326] ------------[ cut here ]------------ > [ 58.813524] WARNING: CPU: 0 PID: 2755 at drivers/scsi/scsi_lib.c:214 scsi_execute_cmd+0x3b/0x2b0 > [ 58.828052] Modules linked in: cfg80211 8021q garp mrp stp ipv6 crc_ccitt joydev hid_generic usbhid snd_seq_midi snd_seq_midi_event psmouse ppdev serio_raw atkbd libps2 vivaldi_fmap uhci_hcd ehci_pci ehci_hcd snd_ens1370 bochs drm_vram_helper snd_rawmidi usbcore drm_ttm_helper sr_mod usb_common snd_pcm cdrom e1000 i2c_piix4 ttm pcspkr gameport pata_acpi parport_pc parport i8042 qemu_fw_cfg serio rtc_cmos floppy snd_seq snd_seq_device snd_timer snd soundcore fuse > [ 58.873677] CPU: 0 PID: 2755 Comm: halt Not tainted 6.1.84 #1 > [ 58.876424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 58.884106] RIP: 0010:scsi_execute_cmd+0x3b/0x2b0 > [ 58.885558] Code: 89 cc 55 44 89 c5 53 48 83 ec 10 4c 8b 74 24 50 48 89 0c 24 4d 85 f6 0f 84 44 02 00 00 49 83 3e 00 74 21 41 83 7e 08 60 74 1a <0f> 0b b8 ea ff ff ff 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 > [ 58.891998] RSP: 0018:ffffc90000153d98 EFLAGS: 00010287 > [ 58.893500] RAX: ffffc90000153df8 RBX: ffff888003d22000 RCX: 0000000000000000 > [ 58.895480] RDX: 0000000000000022 RSI: 0000000000000022 RDI: ffff888003d22000 > [ 58.897583] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000002710 > [ 58.900276] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000002710 > [ 58.902209] R13: ffff888003d22000 R14: ffffc90000153df8 R15: ffffc90000153e28 > [ 58.904084] FS: 00007f097a95b680(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 > [ 58.906285] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 58.907451] CR2: 00007f097a8f5431 CR3: 000000000406e000 CR4: 00000000000006f0 > [ 58.908928] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 58.910326] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 58.911770] Call Trace: > [ 58.912438] <TASK> > [ 58.912977] ? __warn+0x78/0xd0 > [ 58.913751] ? scsi_execute_cmd+0x3b/0x2b0 > [ 58.914757] ? report_bug+0xe6/0x170 > [ 58.916267] ? handle_bug+0x3c/0x70 > [ 58.917020] ? exc_invalid_op+0x13/0x60 > [ 58.917807] ? asm_exc_invalid_op+0x16/0x20 > [ 58.918675] ? scsi_execute_cmd+0x3b/0x2b0 > [ 58.919524] ata_cmd_ioctl+0x112/0x2b0 > [ 58.920435] blkdev_ioctl+0x12e/0x260 > [ 58.921322] __x64_sys_ioctl+0x8b/0xc0 > [ 58.922115] do_syscall_64+0x42/0x90 > [ 58.922953] entry_SYSCALL_64_after_hwframe+0x64/0xce > [ 58.924002] RIP: 0033:0x7f097a87616b > [ 58.924748] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > [ 58.928687] RSP: 002b:00007fff1e70b5b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 58.930465] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f097a87616b > [ 58.932545] RDX: 00007fff1e70b614 RSI: 000000000000031f RDI: 0000000000000004 > [ 58.933964] RBP: 0000000000000000 R08: 0000000000000073 R09: 0000558c7857a343 > [ 58.935349] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000020 > [ 58.936727] R13: 0000558c77faf088 R14: 0000000000000001 R15: 0000000000000000 > [ 58.938165] </TASK> > [ 58.938617] ---[ end trace 0000000000000000 ]--- > [ 58.940450] sd 0:0:0:0: [sda] Synchronizing SCSI cache > [ 58.941662] sd 0:0:0:0: [sda] Stopping disk > [ 58.971754] ACPI: PM: Preparing to enter system sleep state S5 > [ 58.973005] reboot: Power down > > > Bye > Tim > -- #Randy

1 year, 8 months

1
0
0 0

[PATCH v2 0/2] gpio: cdev: label sanitization fixes

by Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> This series fixes a couple of bugs in the sanitization of labels being passed to irq. Patch 1 fixes the case where userspace provides empty labels. Patch 2 fixes a missed path in the sanitization changes that can result in memory corruption. v1 -> v2: - switched the order of the patches in order to avoid introducing buggy code in one just to fix it in the second Bartosz Golaszewski (1): gpio: cdev: check for NULL labels when sanitizing them for irqs Kent Gibson (1): gpio: cdev: fix missed label sanitizing in debounce_setup() drivers/gpio/gpiolib-cdev.c | 46 +++++++++++++++++++++++++------------ 1 file changed, 31 insertions(+), 15 deletions(-) -- 2.40.1

1 year, 8 months

2
6
0 0

[PATCH net v3] net: usb: ax88179_178a: avoid the interface always configured as random address

by Jose Ignacio Tornos Martinez

After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets"), reset is not executed from bind operation and mac address is not read from the device registers or the devicetree at that moment. Since the check to configure if the assigned mac address is random or not for the interface, happens after the bind operation from usbnet_probe, the interface keeps configured as random address, although the address is correctly read and set during open operation (the only reset now). In order to keep only one reset for the device and to avoid the interface always configured as random address, after reset, configure correctly the suitable field from the driver, if the mac address is read successfully from the device registers or the devicetree. cc: stable(a)vger.kernel.org # 6.6+ Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets") Reported-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com> --- v3: - Send the patch separately to net. v2: - Split the fix and the improvement in two patches as Simon Horman suggests. v1: https://lore.kernel.org/netdev/20240325173155.671807-1-jtornosm@redhat.com/ drivers/net/usb/ax88179_178a.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c index 88e084534853..8ca8ace93d9c 100644 --- a/drivers/net/usb/ax88179_178a.c +++ b/drivers/net/usb/ax88179_178a.c @@ -1273,6 +1273,7 @@ static void ax88179_get_mac_addr(struct usbnet *dev) if (is_valid_ether_addr(mac)) { eth_hw_addr_set(dev->net, mac); + dev->net->addr_assign_type = NET_ADDR_PERM; } else { netdev_info(dev->net, "invalid MAC address, using random\n"); eth_hw_addr_random(dev->net); -- 2.44.0

1 year, 8 months

4
6
0 0

[PATCH v2] selftests/ftrace: Limit length in subsystem-enable tests

by Yuanhe Shu

While sched* events being traced and sched* events continuously happen, "[xx] event tracing - enable/disable with subsystem level files" would not stop as on some slower systems it seems to take forever. Select the first 100 lines of output would be enough to judge whether there are more than 3 types of sched events. Fixes: 815b18ea66d6 ("ftracetest: Add basic event tracing test cases") Cc: stable(a)vger.kernel.org Signed-off-by: Yuanhe Shu <xiangzao(a)linux.alibaba.com> --- .../selftests/ftrace/test.d/event/subsystem-enable.tc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc index b1ede6249866..b7c8f29c09a9 100644 --- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc +++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc @@ -18,7 +18,7 @@ echo 'sched:*' > set_event yield -count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l` +count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l` if [ $count -lt 3 ]; then fail "at least fork, exec and exit events should be recorded" fi @@ -29,7 +29,7 @@ echo 1 > events/sched/enable yield -count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l` +count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l` if [ $count -lt 3 ]; then fail "at least fork, exec and exit events should be recorded" fi @@ -40,7 +40,7 @@ echo 0 > events/sched/enable yield -count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l` +count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l` if [ $count -ne 0 ]; then fail "any of scheduler events should not be recorded" fi -- 2.39.3

1 year, 8 months

4
3
0 0

FAILED: patch "[PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO |" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 803de9000f334b771afacb6ff3e78622916668b0 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024032730-triceps-mustang-3ced@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: 803de9000f33 ("mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations") f98a497e1f16 ("mm: compaction: remove unnecessary is_via_compact_memory() checks") e8606320e9af ("mm: compaction: refactor __compaction_suitable()") fe573327ffb1 ("tracing: incorrect gfp_t conversion") cff387d6a294 ("mm: compaction: make compaction_zonelist_suitable return false when COMPACT_SUCCESS") 9353ffa6e9e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS") 53ae233c30a6 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS") f49d9c5bb15c ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS") e9d0ca922816 ("kasan, page_alloc: rework kasan_unpoison_pages call site") 7e3cbba65de2 ("kasan, page_alloc: move kernel_init_free_pages in post_alloc_hook") 89b271163328 ("kasan, page_alloc: move SetPageSkipKASanPoison in post_alloc_hook") 9294b1281d0a ("kasan, page_alloc: combine tag_clear_highpage calls in post_alloc_hook") b42090ae6f3a ("kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hook") b8491b9052fe ("kasan, page_alloc: refactor init checks in post_alloc_hook") 1c0e5b24f117 ("kasan: only apply __GFP_ZEROTAGS when memory is zeroed") c82ce3195fd1 ("mm: clarify __GFP_ZEROTAGS comment") 7c13c163e036 ("kasan, page_alloc: merge kasan_free_pages into free_pages_prepare") 5b2c07138cbd ("kasan, page_alloc: move tag_clear_highpage out of kernel_init_free_pages") 94ae8b83fefc ("kasan, page_alloc: deduplicate should_skip_kasan_poison") 3bf03b9a0839 ("Merge branch 'akpm' (patches from Andrew)") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 803de9000f334b771afacb6ff3e78622916668b0 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka(a)suse.cz> Date: Wed, 21 Feb 2024 12:43:58 +0100 Subject: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations Sven reports an infinite loop in __alloc_pages_slowpath() for costly order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination can happen in a suspend/resume context where a GFP_KERNEL allocation can have __GFP_IO masked out via gfp_allowed_mask. Quoting Sven: 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) with __GFP_RETRY_MAYFAIL set. 2. page alloc's __alloc_pages_slowpath tries to get a page from the freelist. This fails because there is nothing free of that costly order. 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, which bails out because a zone is ready to be compacted; it pretends to have made a single page of progress. 4. page alloc tries to compact, but this always bails out early because __GFP_IO is not set (it's not passed by the snd allocator, and even if it were, we are suspending so the __GFP_IO flag would be cleared anyway). 5. page alloc believes reclaim progress was made (because of the pretense in item 3) and so it checks whether it should retry compaction. The compaction retry logic thinks it should try again, because: a) reclaim is needed because of the early bail-out in item 4 b) a zonelist is suitable for compaction 6. goto 2. indefinite stall. (end quote) The immediate root cause is confusing the COMPACT_SKIPPED returned from __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be indicating a lack of order-0 pages, and in step 5 evaluating that in should_compact_retry() as a reason to retry, before incrementing and limiting the number of retries. There are however other places that wrongly assume that compaction can happen while we lack __GFP_IO. To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO evaluation and switch the open-coded test in try_to_compact_pages() to use it. Also use the new helper in: - compaction_ready(), which will make reclaim not bail out in step 3, so there's at least one attempt to actually reclaim, even if chances are small for a costly order - in_reclaim_compaction() which will make should_continue_reclaim() return false and we don't over-reclaim unnecessarily - in __alloc_pages_slowpath() to set a local variable can_compact, which is then used to avoid retrying reclaim/compaction for costly allocations (step 5) if we can't compact and also to skip the early compaction attempt that we do in some cases Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Reported-by: Sven van Ashbrook <svenva(a)chromium.org> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBz… Tested-by: Karthikeyan Ramasubramanian <kramasub(a)chromium.org> Cc: Brian Geffon <bgeffon(a)google.com> Cc: Curtis Malainey <cujomalainey(a)chromium.org> Cc: Jaroslav Kysela <perex(a)perex.cz> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Takashi Iwai <tiwai(a)suse.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/gfp.h b/include/linux/gfp.h index de292a007138..e2a916cf29c4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp) return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS); } +/* + * Check if the gfp flags allow compaction - GFP_NOIO is a really + * tricky context because the migration might require IO. + */ +static inline bool gfp_compaction_allowed(gfp_t gfp_mask) +{ + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); +} + extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); #ifdef CONFIG_CONTIG_ALLOC diff --git a/mm/compaction.c b/mm/compaction.c index 4add68d40e8d..b961db601df4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, enum compact_priority prio, struct page **capture) { - int may_perform_io = (__force int)(gfp_mask & __GFP_IO); struct zoneref *z; struct zone *zone; enum compact_result rc = COMPACT_SKIPPED; - /* - * Check if the GFP flags allow compaction - GFP_NOIO is really - * tricky context because the migration might require IO - */ - if (!may_perform_io) + if (!gfp_compaction_allowed(gfp_mask)) return COMPACT_SKIPPED; trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 150d4f23b010..a663202045dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; + bool can_compact = gfp_compaction_allowed(gfp_mask); const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; struct page *page = NULL; unsigned int alloc_flags; @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * Don't try this for allocations that are allowed to ignore * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. */ - if (can_direct_reclaim && + if (can_direct_reclaim && can_compact && (costly_order || (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) && !gfp_pfmemalloc_allowed(gfp_mask)) { @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, /* * Do not retry costly high order allocations unless they are - * __GFP_RETRY_MAYFAIL + * __GFP_RETRY_MAYFAIL and we can compact */ - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) + if (costly_order && (!can_compact || + !(gfp_mask & __GFP_RETRY_MAYFAIL))) goto nopage; if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * implementation of the compaction depends on the sufficient amount * of free memory (see __compaction_suitable) */ - if (did_some_progress > 0 && + if (did_some_progress > 0 && can_compact && should_compact_retry(ac, order, alloc_flags, compact_result, &compact_priority, &compaction_retries)) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f9c854ce6cc..4255619a1a31 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) /* Use reclaim/compaction for costly allocs or under memory pressure */ static bool in_reclaim_compaction(struct scan_control *sc) { - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && (sc->order > PAGE_ALLOC_COSTLY_ORDER || sc->priority < DEF_PRIORITY - 2)) return true; @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) { unsigned long watermark; + if (!gfp_compaction_allowed(sc->gfp_mask)) + return false; + /* Allocation can already succeed, nothing to do */ if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), sc->reclaim_idx, 0))

1 year, 8 months

2
1
0 0

FAILED: patch "[PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO |" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 803de9000f334b771afacb6ff3e78622916668b0 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024032727-pastel-sincerity-a986@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: 803de9000f33 ("mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations") f98a497e1f16 ("mm: compaction: remove unnecessary is_via_compact_memory() checks") e8606320e9af ("mm: compaction: refactor __compaction_suitable()") fe573327ffb1 ("tracing: incorrect gfp_t conversion") cff387d6a294 ("mm: compaction: make compaction_zonelist_suitable return false when COMPACT_SUCCESS") 9353ffa6e9e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS") 53ae233c30a6 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS") f49d9c5bb15c ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS") e9d0ca922816 ("kasan, page_alloc: rework kasan_unpoison_pages call site") 7e3cbba65de2 ("kasan, page_alloc: move kernel_init_free_pages in post_alloc_hook") 89b271163328 ("kasan, page_alloc: move SetPageSkipKASanPoison in post_alloc_hook") 9294b1281d0a ("kasan, page_alloc: combine tag_clear_highpage calls in post_alloc_hook") b42090ae6f3a ("kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hook") b8491b9052fe ("kasan, page_alloc: refactor init checks in post_alloc_hook") 1c0e5b24f117 ("kasan: only apply __GFP_ZEROTAGS when memory is zeroed") c82ce3195fd1 ("mm: clarify __GFP_ZEROTAGS comment") 7c13c163e036 ("kasan, page_alloc: merge kasan_free_pages into free_pages_prepare") 5b2c07138cbd ("kasan, page_alloc: move tag_clear_highpage out of kernel_init_free_pages") 94ae8b83fefc ("kasan, page_alloc: deduplicate should_skip_kasan_poison") 3bf03b9a0839 ("Merge branch 'akpm' (patches from Andrew)") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 803de9000f334b771afacb6ff3e78622916668b0 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka(a)suse.cz> Date: Wed, 21 Feb 2024 12:43:58 +0100 Subject: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations Sven reports an infinite loop in __alloc_pages_slowpath() for costly order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination can happen in a suspend/resume context where a GFP_KERNEL allocation can have __GFP_IO masked out via gfp_allowed_mask. Quoting Sven: 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) with __GFP_RETRY_MAYFAIL set. 2. page alloc's __alloc_pages_slowpath tries to get a page from the freelist. This fails because there is nothing free of that costly order. 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, which bails out because a zone is ready to be compacted; it pretends to have made a single page of progress. 4. page alloc tries to compact, but this always bails out early because __GFP_IO is not set (it's not passed by the snd allocator, and even if it were, we are suspending so the __GFP_IO flag would be cleared anyway). 5. page alloc believes reclaim progress was made (because of the pretense in item 3) and so it checks whether it should retry compaction. The compaction retry logic thinks it should try again, because: a) reclaim is needed because of the early bail-out in item 4 b) a zonelist is suitable for compaction 6. goto 2. indefinite stall. (end quote) The immediate root cause is confusing the COMPACT_SKIPPED returned from __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be indicating a lack of order-0 pages, and in step 5 evaluating that in should_compact_retry() as a reason to retry, before incrementing and limiting the number of retries. There are however other places that wrongly assume that compaction can happen while we lack __GFP_IO. To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO evaluation and switch the open-coded test in try_to_compact_pages() to use it. Also use the new helper in: - compaction_ready(), which will make reclaim not bail out in step 3, so there's at least one attempt to actually reclaim, even if chances are small for a costly order - in_reclaim_compaction() which will make should_continue_reclaim() return false and we don't over-reclaim unnecessarily - in __alloc_pages_slowpath() to set a local variable can_compact, which is then used to avoid retrying reclaim/compaction for costly allocations (step 5) if we can't compact and also to skip the early compaction attempt that we do in some cases Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Reported-by: Sven van Ashbrook <svenva(a)chromium.org> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBz… Tested-by: Karthikeyan Ramasubramanian <kramasub(a)chromium.org> Cc: Brian Geffon <bgeffon(a)google.com> Cc: Curtis Malainey <cujomalainey(a)chromium.org> Cc: Jaroslav Kysela <perex(a)perex.cz> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Takashi Iwai <tiwai(a)suse.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/gfp.h b/include/linux/gfp.h index de292a007138..e2a916cf29c4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp) return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS); } +/* + * Check if the gfp flags allow compaction - GFP_NOIO is a really + * tricky context because the migration might require IO. + */ +static inline bool gfp_compaction_allowed(gfp_t gfp_mask) +{ + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); +} + extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); #ifdef CONFIG_CONTIG_ALLOC diff --git a/mm/compaction.c b/mm/compaction.c index 4add68d40e8d..b961db601df4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, enum compact_priority prio, struct page **capture) { - int may_perform_io = (__force int)(gfp_mask & __GFP_IO); struct zoneref *z; struct zone *zone; enum compact_result rc = COMPACT_SKIPPED; - /* - * Check if the GFP flags allow compaction - GFP_NOIO is really - * tricky context because the migration might require IO - */ - if (!may_perform_io) + if (!gfp_compaction_allowed(gfp_mask)) return COMPACT_SKIPPED; trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 150d4f23b010..a663202045dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; + bool can_compact = gfp_compaction_allowed(gfp_mask); const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; struct page *page = NULL; unsigned int alloc_flags; @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * Don't try this for allocations that are allowed to ignore * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. */ - if (can_direct_reclaim && + if (can_direct_reclaim && can_compact && (costly_order || (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) && !gfp_pfmemalloc_allowed(gfp_mask)) { @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, /* * Do not retry costly high order allocations unless they are - * __GFP_RETRY_MAYFAIL + * __GFP_RETRY_MAYFAIL and we can compact */ - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) + if (costly_order && (!can_compact || + !(gfp_mask & __GFP_RETRY_MAYFAIL))) goto nopage; if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * implementation of the compaction depends on the sufficient amount * of free memory (see __compaction_suitable) */ - if (did_some_progress > 0 && + if (did_some_progress > 0 && can_compact && should_compact_retry(ac, order, alloc_flags, compact_result, &compact_priority, &compaction_retries)) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f9c854ce6cc..4255619a1a31 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) /* Use reclaim/compaction for costly allocs or under memory pressure */ static bool in_reclaim_compaction(struct scan_control *sc) { - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && (sc->order > PAGE_ALLOC_COSTLY_ORDER || sc->priority < DEF_PRIORITY - 2)) return true; @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) { unsigned long watermark; + if (!gfp_compaction_allowed(sc->gfp_mask)) + return false; + /* Allocation can already succeed, nothing to do */ if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), sc->reclaim_idx, 0))

1 year, 8 months

2
3
0 0

FAILED: patch "[PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO |" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 803de9000f334b771afacb6ff3e78622916668b0 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024032725-amigo-dental-d3bd@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: 803de9000f33 ("mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations") f98a497e1f16 ("mm: compaction: remove unnecessary is_via_compact_memory() checks") e8606320e9af ("mm: compaction: refactor __compaction_suitable()") fe573327ffb1 ("tracing: incorrect gfp_t conversion") cff387d6a294 ("mm: compaction: make compaction_zonelist_suitable return false when COMPACT_SUCCESS") 9353ffa6e9e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS") 53ae233c30a6 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS") f49d9c5bb15c ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS") e9d0ca922816 ("kasan, page_alloc: rework kasan_unpoison_pages call site") 7e3cbba65de2 ("kasan, page_alloc: move kernel_init_free_pages in post_alloc_hook") 89b271163328 ("kasan, page_alloc: move SetPageSkipKASanPoison in post_alloc_hook") 9294b1281d0a ("kasan, page_alloc: combine tag_clear_highpage calls in post_alloc_hook") b42090ae6f3a ("kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hook") b8491b9052fe ("kasan, page_alloc: refactor init checks in post_alloc_hook") 1c0e5b24f117 ("kasan: only apply __GFP_ZEROTAGS when memory is zeroed") c82ce3195fd1 ("mm: clarify __GFP_ZEROTAGS comment") 7c13c163e036 ("kasan, page_alloc: merge kasan_free_pages into free_pages_prepare") 5b2c07138cbd ("kasan, page_alloc: move tag_clear_highpage out of kernel_init_free_pages") 94ae8b83fefc ("kasan, page_alloc: deduplicate should_skip_kasan_poison") 3bf03b9a0839 ("Merge branch 'akpm' (patches from Andrew)") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 803de9000f334b771afacb6ff3e78622916668b0 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka(a)suse.cz> Date: Wed, 21 Feb 2024 12:43:58 +0100 Subject: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations Sven reports an infinite loop in __alloc_pages_slowpath() for costly order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination can happen in a suspend/resume context where a GFP_KERNEL allocation can have __GFP_IO masked out via gfp_allowed_mask. Quoting Sven: 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) with __GFP_RETRY_MAYFAIL set. 2. page alloc's __alloc_pages_slowpath tries to get a page from the freelist. This fails because there is nothing free of that costly order. 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, which bails out because a zone is ready to be compacted; it pretends to have made a single page of progress. 4. page alloc tries to compact, but this always bails out early because __GFP_IO is not set (it's not passed by the snd allocator, and even if it were, we are suspending so the __GFP_IO flag would be cleared anyway). 5. page alloc believes reclaim progress was made (because of the pretense in item 3) and so it checks whether it should retry compaction. The compaction retry logic thinks it should try again, because: a) reclaim is needed because of the early bail-out in item 4 b) a zonelist is suitable for compaction 6. goto 2. indefinite stall. (end quote) The immediate root cause is confusing the COMPACT_SKIPPED returned from __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be indicating a lack of order-0 pages, and in step 5 evaluating that in should_compact_retry() as a reason to retry, before incrementing and limiting the number of retries. There are however other places that wrongly assume that compaction can happen while we lack __GFP_IO. To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO evaluation and switch the open-coded test in try_to_compact_pages() to use it. Also use the new helper in: - compaction_ready(), which will make reclaim not bail out in step 3, so there's at least one attempt to actually reclaim, even if chances are small for a costly order - in_reclaim_compaction() which will make should_continue_reclaim() return false and we don't over-reclaim unnecessarily - in __alloc_pages_slowpath() to set a local variable can_compact, which is then used to avoid retrying reclaim/compaction for costly allocations (step 5) if we can't compact and also to skip the early compaction attempt that we do in some cases Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Reported-by: Sven van Ashbrook <svenva(a)chromium.org> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBz… Tested-by: Karthikeyan Ramasubramanian <kramasub(a)chromium.org> Cc: Brian Geffon <bgeffon(a)google.com> Cc: Curtis Malainey <cujomalainey(a)chromium.org> Cc: Jaroslav Kysela <perex(a)perex.cz> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Takashi Iwai <tiwai(a)suse.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/gfp.h b/include/linux/gfp.h index de292a007138..e2a916cf29c4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp) return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS); } +/* + * Check if the gfp flags allow compaction - GFP_NOIO is a really + * tricky context because the migration might require IO. + */ +static inline bool gfp_compaction_allowed(gfp_t gfp_mask) +{ + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); +} + extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); #ifdef CONFIG_CONTIG_ALLOC diff --git a/mm/compaction.c b/mm/compaction.c index 4add68d40e8d..b961db601df4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, enum compact_priority prio, struct page **capture) { - int may_perform_io = (__force int)(gfp_mask & __GFP_IO); struct zoneref *z; struct zone *zone; enum compact_result rc = COMPACT_SKIPPED; - /* - * Check if the GFP flags allow compaction - GFP_NOIO is really - * tricky context because the migration might require IO - */ - if (!may_perform_io) + if (!gfp_compaction_allowed(gfp_mask)) return COMPACT_SKIPPED; trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 150d4f23b010..a663202045dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; + bool can_compact = gfp_compaction_allowed(gfp_mask); const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; struct page *page = NULL; unsigned int alloc_flags; @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * Don't try this for allocations that are allowed to ignore * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. */ - if (can_direct_reclaim && + if (can_direct_reclaim && can_compact && (costly_order || (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) && !gfp_pfmemalloc_allowed(gfp_mask)) { @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, /* * Do not retry costly high order allocations unless they are - * __GFP_RETRY_MAYFAIL + * __GFP_RETRY_MAYFAIL and we can compact */ - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) + if (costly_order && (!can_compact || + !(gfp_mask & __GFP_RETRY_MAYFAIL))) goto nopage; if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * implementation of the compaction depends on the sufficient amount * of free memory (see __compaction_suitable) */ - if (did_some_progress > 0 && + if (did_some_progress > 0 && can_compact && should_compact_retry(ac, order, alloc_flags, compact_result, &compact_priority, &compaction_retries)) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f9c854ce6cc..4255619a1a31 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) /* Use reclaim/compaction for costly allocs or under memory pressure */ static bool in_reclaim_compaction(struct scan_control *sc) { - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && (sc->order > PAGE_ALLOC_COSTLY_ORDER || sc->priority < DEF_PRIORITY - 2)) return true; @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) { unsigned long watermark; + if (!gfp_compaction_allowed(sc->gfp_mask)) + return false; + /* Allocation can already succeed, nothing to do */ if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), sc->reclaim_idx, 0))

1 year, 8 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2024