Changes since v3 [1] (note v4 was a partial re-roll, but more feedback
came in requiring a v5):
- Fix a kbuild robot report for a missing header include of cc_platform.h
- Switch to selecting STRICT_DEVMEM and IOSTRICT_DEVMEM rather than
"depends on". (Naveen)
- Clarify the "SEPT violation" vs "crash" and other changelog fixups for
devmem maintainers and other arch maintainers. (Dave)
- Drop patch numbering since patch2 is a fix and has no dependencies on
patch1
[1]: http://lore.kernel.org/174491711228.1395340.3647010925173796093.stgit@dwill…
---
The original response to Nikolay's report of a "crash" (unhandled SEPT
violation) triggered by /dev/mem access to private memory was "let's
just turn off /dev/mem".
After some machinations of x86_platform_ops to block a subset of
problematic access, spelunking the history of devmem_is_allowed()
returning "2" to enable some compatibility benefits while blocking
access, and discovering that userspace depends buggy kernel behavior for
mmap(2) of the first 1MB of memory on x86, the proposal has circled back
to "disable /dev/mem".
Require both STRICT_DEVMEM and IO_STRICT_DEVMEM for x86 confidential
guests to close /dev/mem hole while still allowing for userspace
mapping of PCI MMIO as long as the kernel and userspace are not mapping
the range at the same time.
The range_is_allowed() cleanup is not strictly necessary, but might as
well close a 17 year-old "TODO".
Dan Williams (2):
x86/devmem: Remove duplicate range_is_allowed() definition
x86/devmem: Drop /dev/mem access for confidential guests
arch/x86/Kconfig | 4 ++++
arch/x86/mm/pat/memtype.c | 31 ++++---------------------------
drivers/char/mem.c | 28 ++++++++++------------------
include/linux/io.h | 21 +++++++++++++++++++++
4 files changed, 39 insertions(+), 45 deletions(-)
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
--
2.49.0
Hello again,
This is the 6.1.y backports set from 6.10 which corresponds to the 6.6.y
backports set from here:
https://lore.kernel.org/all/20241002174108.64615-1-catherine.hoang@oracle.c…
The following patches were included:
f43bd357fde0 xfs: fix error returns from xfs_bmapi_write
4bcef72d96b5 xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
c299188b443a xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
c13c21f77824 xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovey
0934046e3392 xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2
9716cdcc2f9e xfs: validate recovered name buffers when recovering xattr items
20adb1e2f069 xfs: revert commit 44af6c7e59b12
f24ba2183148 xfs: match lock mode in xfs_buffered_write_iomap_begin()
0f726c17dfd8 xfs: make the seq argument to xfs_bmapi_convert_delalloc() optional
36081fd0ee37 xfs: make xfs_bmapi_convert_delalloc() to allocate the target offset
4c99f3026cf2 xfs: convert delayed extents to unwritten when zeroing post eof blocks
0aca73915dc1 xfs: allow symlinks with short remote targets
0e52b98bf041 xfs: make sure sb_fdblocks is non-negative
2bc2d49c36c2 xfs: fix freeing speculative preallocations for preallocated files
3eeac3311683 xfs: allow unlinked symlinks and dirs with zero size
9a0ab4fc28ed xfs: restrict when we try to align cow fork delalloc to cowextsz hints
The following patches were omitted:
cad051826d83 xfs: fix missing check for invalid attr flags
scrub
db460c26f0b0 xfs: check shortform attr entry flags specifically
scrub
5689d2345a01 xfs: enforce one namespace per attribute
doesn't cherry pick cleanly, it is some refactoring and isn't a
dependency for other patches, lets skip it for now
7c03b124353a xfs: use dontcache for grabbing inodes during scrub
scrub
740a427e8f45 xfs: fix unlink vs cluster buffer instantiation race
already in 6.1.y
No fixes were found on mainline for any of the patches being ported.
The auto group was run 1x on each of these configs:
xfs/4k
xfs/1k
xfs/logdev
xfs/realtime
xfs/quota
xfs/v4
xfs/dax
xfs/adv
xfs/dirblock_8k
and no regressions were seen. This set has already been ack'd on the
xfs-stable list.
Let me know if you see any issues. Thanks,
Leah
Christoph Hellwig (4):
xfs: fix error returns from xfs_bmapi_write
xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
xfs: fix freeing speculative preallocations for preallocated files
Darrick J. Wong (7):
xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item
recovery
xfs: check opcode and iovec count match in
xlog_recover_attri_commit_pass2
xfs: validate recovered name buffers when recovering xattr items
xfs: revert commit 44af6c7e59b12
xfs: allow symlinks with short remote targets
xfs: allow unlinked symlinks and dirs with zero size
xfs: restrict when we try to align cow fork delalloc to cowextsz hints
Wengang Wang (1):
xfs: make sure sb_fdblocks is non-negative
Zhang Yi (4):
xfs: match lock mode in xfs_buffered_write_iomap_begin()
xfs: make the seq argument to xfs_bmapi_convert_delalloc() optional
xfs: make xfs_bmapi_convert_delalloc() to allocate the target offset
xfs: convert delayed extents to unwritten when zeroing post eof blocks
fs/xfs/libxfs/xfs_attr_remote.c | 1 -
fs/xfs/libxfs/xfs_bmap.c | 130 ++++++++++++++++++++++++++------
fs/xfs/libxfs/xfs_da_btree.c | 20 ++---
fs/xfs/libxfs/xfs_inode_buf.c | 47 ++++++++++--
fs/xfs/libxfs/xfs_sb.c | 7 +-
fs/xfs/scrub/attr.c | 5 ++
fs/xfs/xfs_aops.c | 54 ++++---------
fs/xfs/xfs_attr_item.c | 88 ++++++++++++++++++---
fs/xfs/xfs_bmap_util.c | 61 +++++++++------
fs/xfs/xfs_bmap_util.h | 2 +-
fs/xfs/xfs_dquot.c | 1 -
fs/xfs/xfs_icache.c | 2 +-
fs/xfs/xfs_inode.c | 14 +---
fs/xfs/xfs_iomap.c | 81 +++++++++++---------
fs/xfs/xfs_reflink.c | 20 -----
fs/xfs/xfs_rtalloc.c | 2 -
16 files changed, 342 insertions(+), 193 deletions(-)
--
2.49.0.906.g1f30a19c02-goog
Fix the bypassing of invalid packet pointer check in 6.6 by backporting
the entire "bpf: track changes_pkt_data property for global functions"
series[1], along with the follow up, "bpf: fix NPE when computing
changes_pkt_data of program w/o subprograms" series[2]; both from
Eduard.
I had ran the BPF selftests after backporting, confirmed that newly
added BPF selftests passes, and that no new failure observed in BPF
selftests (dummy_st_ops, ns_current_pid_tgid, test_bpf_ma, and
test_bpffs are failing even without this patchset applied). See [3] for
the full log.
#50/1 changes_pkt_data_freplace/changes_pkt_data_with_changes_pkt_data:OK
#50/2 changes_pkt_data_freplace/changes_pkt_data_with_does_not_change_pkt_data:OK
#50/3 changes_pkt_data_freplace/does_not_change_pkt_data_with_changes_pkt_data:OK
#50/4 changes_pkt_data_freplace/does_not_change_pkt_data_with_does_not_change_pkt_data:OK
#50/5 changes_pkt_data_freplace/main_changes_with_changes_pkt_data:OK
#50/6 changes_pkt_data_freplace/main_changes_with_does_not_change_pkt_data:OK
#50/7 changes_pkt_data_freplace/main_does_not_change_with_changes_pkt_data:OK
#50/8 changes_pkt_data_freplace/main_does_not_change_with_does_not_change_pkt_data:OK
#395/57 verifier_sock/invalidate_pkt_pointers_from_global_func:OK
#395/58 verifier_sock/invalidate_pkt_pointers_by_tail_call:OK
1: https://lore.kernel.org/all/20241210041100.1898468-1-eddyz87@gmail.com/
2: https://lore.kernel.org/all/20241212070711.427443-1-eddyz87@gmail.com/
3: https://github.com/shunghsiyu/libbpf/actions/runs/14747347842/job/413971041…
Eduard Zingerman (10):
bpf: add find_containing_subprog() utility function
bpf: refactor bpf_helper_changes_pkt_data to use helper number
bpf: track changes_pkt_data property for global functions
selftests/bpf: test for changing packet data from global functions
bpf: check changes_pkt_data property for extension programs
selftests/bpf: freplace tests for tracking of changes_packet_data
bpf: consider that tail calls invalidate packet pointers
selftests/bpf: validate that tail call invalidates packet pointers
bpf: fix null dereference when computing changes_pkt_data of prog w/o
subprogs
selftests/bpf: extend changes_pkt_data with cases w/o subprograms
include/linux/bpf.h | 1 +
include/linux/bpf_verifier.h | 1 +
include/linux/filter.h | 2 +-
kernel/bpf/core.c | 2 +-
kernel/bpf/verifier.c | 81 +++++++++++--
net/core/filter.c | 63 +++++------
.../bpf/prog_tests/changes_pkt_data.c | 107 ++++++++++++++++++
.../selftests/bpf/progs/changes_pkt_data.c | 39 +++++++
.../bpf/progs/changes_pkt_data_freplace.c | 18 +++
.../selftests/bpf/progs/verifier_sock.c | 56 +++++++++
10 files changed, 324 insertions(+), 46 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/changes_pkt_data.c
create mode 100644 tools/testing/selftests/bpf/progs/changes_pkt_data.c
create mode 100644 tools/testing/selftests/bpf/progs/changes_pkt_data_freplace.c
--
2.49.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 44d9b3f584c59a606b521e7274e658d5b866c699
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042808-vastly-armrest-7811@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 44d9b3f584c59a606b521e7274e658d5b866c699 Mon Sep 17 00:00:00 2001
From: Ian Abbott <abbotti(a)mev.co.uk>
Date: Tue, 15 Apr 2025 13:39:01 +0100
Subject: [PATCH] comedi: jr3_pci: Fix synchronous deletion of timer
When `jr3_pci_detach()` is called during device removal, it calls
`timer_delete_sync()` to stop the timer, but the timer expiry function
always reschedules the timer, so the synchronization is ineffective.
Call `timer_shutdown_sync()` instead. It does not matter that the timer
expiry function pointer is cleared, because the device is being removed.
Fixes: 07b509e6584a5 ("Staging: comedi: add jr3_pci driver")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
Link: https://lore.kernel.org/r/20250415123901.13483-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/comedi/drivers/jr3_pci.c b/drivers/comedi/drivers/jr3_pci.c
index cdc842b32bab..75dce1ff2419 100644
--- a/drivers/comedi/drivers/jr3_pci.c
+++ b/drivers/comedi/drivers/jr3_pci.c
@@ -758,7 +758,7 @@ static void jr3_pci_detach(struct comedi_device *dev)
struct jr3_pci_dev_private *devpriv = dev->private;
if (devpriv)
- timer_delete_sync(&devpriv->timer);
+ timer_shutdown_sync(&devpriv->timer);
comedi_pci_detach(dev);
}
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 170d1a3738908eef6a0dbf378ea77fb4ae8e294d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042803-deflected-overdue-01a5@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 170d1a3738908eef6a0dbf378ea77fb4ae8e294d Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Tue, 25 Mar 2025 18:49:00 +0000
Subject: [PATCH] binder: fix offset calculation in debug log
The vma start address should be substracted from the buffer's user data
address and not the other way around.
Cc: Tiffany Y. Yang <ynaffit(a)google.com>
Cc: stable <stable(a)kernel.org>
Fixes: 162c79731448 ("binder: avoid user addresses in debug logs")
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Tiffany Y. Yang <ynaffit(a)google.com>
Link: https://lore.kernel.org/r/20250325184902.587138-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 76052006bd87..5fc2c8ee61b1 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -6373,7 +6373,7 @@ static void print_binder_transaction_ilocked(struct seq_file *m,
seq_printf(m, " node %d", buffer->target_node->debug_id);
seq_printf(m, " size %zd:%zd offset %lx\n",
buffer->data_size, buffer->offsets_size,
- proc->alloc.vm_start - buffer->user_data);
+ buffer->user_data - proc->alloc.vm_start);
}
static void print_binder_work_ilocked(struct seq_file *m,
Unlike patch_text(), patch_text_nosync() takes the length in bytes, not
number of instructions. It is therefore wrong for arch_prepare_ss_slot() to
pass length=1 while patching one instruction.
This bug was introduced by commit b1756750a397 ("riscv: kprobes: Use
patch_text_nosync() for insn slots"). It has been fixed upstream by commit
51781ce8f448 ("riscv: Pass patch_text() the length in bytes"). However,
beside fixing this bug, this commit does many other things, making it
unsuitable for backporting.
Fix it by properly passing the lengths in bytes.
Fixes: b1756750a397 ("riscv: kprobes: Use patch_text_nosync() for insn slots")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
---
arch/riscv/kernel/probes/kprobes.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index 4fbc70e823f0..dc431b965bc3 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -28,8 +28,8 @@ static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
p->ainsn.api.restore = (unsigned long)p->addr + offset;
- patch_text_nosync(p->ainsn.api.insn, &p->opcode, 1);
- patch_text_nosync((void *)p->ainsn.api.insn + offset, &insn, 1);
+ patch_text_nosync(p->ainsn.api.insn, &p->opcode, offset);
+ patch_text_nosync((void *)p->ainsn.api.insn + offset, &insn, sizeof(insn));
}
static void __kprobes arch_prepare_simulate(struct kprobe *p)
--
2.39.5
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 44d9b3f584c59a606b521e7274e658d5b866c699
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042807-vanquish-unbeaten-57ab@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 44d9b3f584c59a606b521e7274e658d5b866c699 Mon Sep 17 00:00:00 2001
From: Ian Abbott <abbotti(a)mev.co.uk>
Date: Tue, 15 Apr 2025 13:39:01 +0100
Subject: [PATCH] comedi: jr3_pci: Fix synchronous deletion of timer
When `jr3_pci_detach()` is called during device removal, it calls
`timer_delete_sync()` to stop the timer, but the timer expiry function
always reschedules the timer, so the synchronization is ineffective.
Call `timer_shutdown_sync()` instead. It does not matter that the timer
expiry function pointer is cleared, because the device is being removed.
Fixes: 07b509e6584a5 ("Staging: comedi: add jr3_pci driver")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
Link: https://lore.kernel.org/r/20250415123901.13483-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/comedi/drivers/jr3_pci.c b/drivers/comedi/drivers/jr3_pci.c
index cdc842b32bab..75dce1ff2419 100644
--- a/drivers/comedi/drivers/jr3_pci.c
+++ b/drivers/comedi/drivers/jr3_pci.c
@@ -758,7 +758,7 @@ static void jr3_pci_detach(struct comedi_device *dev)
struct jr3_pci_dev_private *devpriv = dev->private;
if (devpriv)
- timer_delete_sync(&devpriv->timer);
+ timer_shutdown_sync(&devpriv->timer);
comedi_pci_detach(dev);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 44d9b3f584c59a606b521e7274e658d5b866c699
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042808-difficult-germicide-075a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 44d9b3f584c59a606b521e7274e658d5b866c699 Mon Sep 17 00:00:00 2001
From: Ian Abbott <abbotti(a)mev.co.uk>
Date: Tue, 15 Apr 2025 13:39:01 +0100
Subject: [PATCH] comedi: jr3_pci: Fix synchronous deletion of timer
When `jr3_pci_detach()` is called during device removal, it calls
`timer_delete_sync()` to stop the timer, but the timer expiry function
always reschedules the timer, so the synchronization is ineffective.
Call `timer_shutdown_sync()` instead. It does not matter that the timer
expiry function pointer is cleared, because the device is being removed.
Fixes: 07b509e6584a5 ("Staging: comedi: add jr3_pci driver")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
Link: https://lore.kernel.org/r/20250415123901.13483-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/comedi/drivers/jr3_pci.c b/drivers/comedi/drivers/jr3_pci.c
index cdc842b32bab..75dce1ff2419 100644
--- a/drivers/comedi/drivers/jr3_pci.c
+++ b/drivers/comedi/drivers/jr3_pci.c
@@ -758,7 +758,7 @@ static void jr3_pci_detach(struct comedi_device *dev)
struct jr3_pci_dev_private *devpriv = dev->private;
if (devpriv)
- timer_delete_sync(&devpriv->timer);
+ timer_shutdown_sync(&devpriv->timer);
comedi_pci_detach(dev);
}
The quilt patch titled
Subject: mm/userfaultfd: prevent busy looping for tasks with signals pending
has been removed from the -mm tree. Its filename was
mm-userfaultfd-prevent-busy-looping-for-tasks-with-signals-pending.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: Jens Axboe <axboe(a)kernel.dk>
Subject: mm/userfaultfd: prevent busy looping for tasks with signals pending
Date: Wed, 23 Apr 2025 17:37:06 -0600
userfaultfd may use interruptible sleeps to wait on userspace filling a
page fault, which works fine if the task can be reliably put to sleeping
waiting for that. However, if the task has a normal (ie non-fatal) signal
pending, then TASK_INTERRUPTIBLE sleep will simply cause schedule() to be
a no-op.
For a task that registers a page with userfaultfd and then proceeds to do
a write from it, if that task also has a signal pending then it'll
essentially busy loop from do_page_fault() -> handle_userfault() until
that fault has been filled. Normally it'd be expected that the task would
sleep until that happens. Here's a trace from an application doing just
that:
handle_userfault+0x4b8/0xa00 (P)
hugetlb_fault+0xe24/0x1060
handle_mm_fault+0x2bc/0x318
do_page_fault+0x1e8/0x6f0
do_translation_fault+0x9c/0xd0
do_mem_abort+0x44/0xa0
el1_abort+0x3c/0x68
el1h_64_sync_handler+0xd4/0x100
el1h_64_sync+0x6c/0x70
fault_in_readable+0x74/0x108 (P)
iomap_file_buffered_write+0x14c/0x438
blkdev_write_iter+0x1a8/0x340
vfs_write+0x20c/0x348
ksys_write+0x64/0x108
__arm64_sys_write+0x1c/0x38
where the task is looping with 100% CPU time in the above mentioned fault
path.
Since it's impossible to handle signals, or other conditions like
TIF_NOTIFY_SIGNAL that also prevents interruptible sleeping, from the
fault path, use TASK_UNINTERRUPTIBLE with a short timeout even for vmf
modes that would normally ask for INTERRUPTIBLE or KILLABLE sleep. Fatal
signals will still be handled by the caller, and the timeout is short
enough to hopefully not cause any issues. If this is the first invocation
of this fault, eg FAULT_FLAG_TRIED isn't set, then the normal sleep mode
is used.
Link: https://lkml.kernel.org/r/27c3a7f5-aad8-4f2a-a66e-ff5ae98f31eb@kernel.dk
Fixes: 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Reported-by: Zhiwei Jiang <qq282012236(a)gmail.com>
Closes: https://lore.kernel.org/io-uring/20250422162913.1242057-1-qq282012236@gmail…
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 34 ++++++++++++++++++++++++++--------
1 file changed, 26 insertions(+), 8 deletions(-)
--- a/fs/userfaultfd.c~mm-userfaultfd-prevent-busy-looping-for-tasks-with-signals-pending
+++ a/fs/userfaultfd.c
@@ -334,15 +334,29 @@ out:
return ret;
}
-static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags)
+struct userfault_wait {
+ unsigned int task_state;
+ bool timeout;
+};
+
+static struct userfault_wait userfaultfd_get_blocking_state(unsigned int flags)
{
+ /*
+ * If the fault has already been tried AND there's a signal pending
+ * for this task, use TASK_UNINTERRUPTIBLE with a small timeout.
+ * This prevents busy looping where schedule() otherwise does nothing
+ * for TASK_INTERRUPTIBLE when the task has a signal pending.
+ */
+ if ((flags & FAULT_FLAG_TRIED) && signal_pending(current))
+ return (struct userfault_wait) { TASK_UNINTERRUPTIBLE, true };
+
if (flags & FAULT_FLAG_INTERRUPTIBLE)
- return TASK_INTERRUPTIBLE;
+ return (struct userfault_wait) { TASK_INTERRUPTIBLE, false };
if (flags & FAULT_FLAG_KILLABLE)
- return TASK_KILLABLE;
+ return (struct userfault_wait) { TASK_KILLABLE, false };
- return TASK_UNINTERRUPTIBLE;
+ return (struct userfault_wait) { TASK_UNINTERRUPTIBLE, false };
}
/*
@@ -368,7 +382,7 @@ vm_fault_t handle_userfault(struct vm_fa
struct userfaultfd_wait_queue uwq;
vm_fault_t ret = VM_FAULT_SIGBUS;
bool must_wait;
- unsigned int blocking_state;
+ struct userfault_wait wait_mode;
/*
* We don't do userfault handling for the final child pid update
@@ -466,7 +480,7 @@ vm_fault_t handle_userfault(struct vm_fa
uwq.ctx = ctx;
uwq.waken = false;
- blocking_state = userfaultfd_get_blocking_state(vmf->flags);
+ wait_mode = userfaultfd_get_blocking_state(vmf->flags);
/*
* Take the vma lock now, in order to safely call
@@ -488,7 +502,7 @@ vm_fault_t handle_userfault(struct vm_fa
* following the spin_unlock to happen before the list_add in
* __add_wait_queue.
*/
- set_current_state(blocking_state);
+ set_current_state(wait_mode.task_state);
spin_unlock_irq(&ctx->fault_pending_wqh.lock);
if (!is_vm_hugetlb_page(vma))
@@ -501,7 +515,11 @@ vm_fault_t handle_userfault(struct vm_fa
if (likely(must_wait && !READ_ONCE(ctx->released))) {
wake_up_poll(&ctx->fd_wqh, EPOLLIN);
- schedule();
+ /* See comment in userfaultfd_get_blocking_state() */
+ if (!wait_mode.timeout)
+ schedule();
+ else
+ schedule_timeout(HZ / 10);
}
__set_current_state(TASK_RUNNING);
_
Patches currently in -mm which might be from axboe(a)kernel.dk are