In the event that we add to ovflist, before 339ddb53d373 we would be
woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.
With that wakeup removed, if we add to ovflist here, we may never wake
up. Rather than adding back the ep_scan_ready_list wakeup - which was
resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback.
We noticed that one of our workloads was missing wakeups starting with
339ddb53d373 and upon manual inspection, this wakeup seemed missing to
me. With this patch added, we no longer see missing wakeups. I haven't
yet tried to make a small reproducer, but the existing kselftests in
filesystem/epoll passed for me with this patch.
Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Signed-off-by: Khazhismel Kumykov <khazhy(a)google.com>
Reviewed-by: Roman Penyaev <rpenyaev(a)suse.de>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Heiher <r(a)hev.cc>
Cc: Jason Baron <jbaron(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
---
v2: use if/elif instead of goto + cleanup suggested by Roman
fs/eventpoll.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8c596641a72b..d6ba0e52439b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1171,6 +1171,10 @@ static inline bool chain_epi_lockless(struct epitem *epi)
{
struct eventpoll *ep = epi->ep;
+ /* Fast preliminary check */
+ if (epi->next != EP_UNACTIVE_PTR)
+ return false;
+
/* Check that the same epi has not been just chained from another CPU */
if (cmpxchg(&epi->next, EP_UNACTIVE_PTR, NULL) != EP_UNACTIVE_PTR)
return false;
@@ -1237,16 +1241,12 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
* chained in ep->ovflist and requeued later on.
*/
if (READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR) {
- if (epi->next == EP_UNACTIVE_PTR &&
- chain_epi_lockless(epi))
+ if (chain_epi_lockless(epi))
+ ep_pm_stay_awake_rcu(epi);
+ } else if (!ep_is_linked(epi)) {
+ /* In the usual case, add event to ready list. */
+ if (list_add_tail_lockless(&epi->rdllink, &ep->rdllist))
ep_pm_stay_awake_rcu(epi);
- goto out_unlock;
- }
-
- /* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(epi) &&
- list_add_tail_lockless(&epi->rdllink, &ep->rdllist)) {
- ep_pm_stay_awake_rcu(epi);
}
/*
--
2.26.2.303.gf8c07b1a785-goog
Greetings,
My name is Felix,I am contacting you in respect of an urgent
matter (Deal) regarding funds in excess of Nine Million US
Dollars which resulted from a liquidated BTC account belonging to
a deceased account holder. I will let you in on my plan and why I
chose to contact you in the first place after I have received
your reply and gaining your trust.
Many thanks and looking forward to your reply.
Felix.
Commit 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the
machine crash handler") fixed an issue in the FW assisted dump of
machines using hash MMU and the XIVE interrupt mode under the POWER
hypervisor. It forced the mapping of the ESB page of interrupts being
mapped in the Linux IRQ number space to make sure the 'crash kexec'
sequence worked during such an event. But it didn't handle the
un-mapping.
This mapping is now blocking the removal of a passthrough IO adapter
under the POWER hypervisor because it expects the guest OS to have
cleared all page table entries related to the adapter. If some are
still present, the RTAS call which isolates the PCI slot returns error
9001 "valid outstanding translations".
Remove these mapping in the IRQ data cleanup routine.
Under KVM, this cleanup is not required because the ESB pages for the
adapter interrupts are un-mapped from the guest by the hypervisor in
the KVM XIVE native device. This is now redundant but it's harmless.
Fixes: 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the machine crash handler")
Cc: stable(a)vger.kernel.org # v5.5+
Signed-off-by: Cédric Le Goater <clg(a)kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 9603b2830d03..3dbc94cb4380 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -19,6 +19,7 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/msi.h>
+#include <linux/vmalloc.h>
#include <asm/debugfs.h>
#include <asm/prom.h>
@@ -1020,12 +1021,16 @@ EXPORT_SYMBOL_GPL(is_xive_irq);
void xive_cleanup_irq_data(struct xive_irq_data *xd)
{
if (xd->eoi_mmio) {
+ unmap_kernel_range((unsigned long)xd->eoi_mmio,
+ 1u << xd->esb_shift);
iounmap(xd->eoi_mmio);
if (xd->eoi_mmio == xd->trig_mmio)
xd->trig_mmio = NULL;
xd->eoi_mmio = NULL;
}
if (xd->trig_mmio) {
+ unmap_kernel_range((unsigned long)xd->trig_mmio,
+ 1u << xd->esb_shift);
iounmap(xd->trig_mmio);
xd->trig_mmio = NULL;
}
--
2.25.4
Greetings,
My name is Felix,I am contacting you in respect of an urgent
matter (Deal) regarding funds in excess of Nine Million US
Dollars which resulted from a liquidated BTC account belonging to
a deceased account holder. I will let you in on my plan and why I
chose to contact you in the first place after I have received
your reply and gaining your trust.
Many thanks and looking forward to your reply.
Felix.