On 01/10/2018 10:57, Jan Beulich wrote:
>>>> On 01.10.18 at 09:16, <jgross(a)suse.com> wrote:
>> xen_qlock_wait() isn't safe for nested calls due to interrupts. A call
>> of xen_qlock_kick() might be ignored in case a deeper nesting level
>> was active right before the call of xen_poll_irq():
>>
>> CPU 1: CPU 2:
>> spin_lock(lock1)
>> spin_lock(lock1)
>> -> xen_qlock_wait()
>> -> xen_clear_irq_pending()
>> Interrupt happens
>> spin_unlock(lock1)
>> -> xen_qlock_kick(CPU 2)
>> spin_lock_irqsave(lock2)
>> spin_lock_irqsave(lock2)
>> -> xen_qlock_wait()
>> -> xen_clear_irq_pending()
>> clears kick for lock1
>> -> xen_poll_irq()
>> spin_unlock_irq_restore(lock2)
>> -> xen_qlock_kick(CPU 2)
>> wakes up
>> spin_unlock_irq_restore(lock2)
>> IRET
>> resumes in xen_qlock_wait()
>> -> xen_poll_irq()
>> never wakes up
>>
>> The solution is to disable interrupts in xen_qlock_wait() and not to
>> poll for the irq in case xen_qlock_wait() is called in nmi context.
>
> Are precautions against NMI really worthwhile? Locks acquired both
> in NMI context as well as outside of it are liable to deadlock anyway,
> aren't they?
The locks don't need to be the same. A NMI-only lock tried to be
acquired with xen_qlock_wait() for another lock having been interrupted
by the NMI will be enough to risk the issue.
So yes, I believe the test for NMI is good to have.
Juergen
On 01/10/2018 10:54, Jan Beulich wrote:
>>>> On 01.10.18 at 09:16, <jgross(a)suse.com> wrote:
>> In the following situation a vcpu waiting for a lock might not be
>> woken up from xen_poll_irq():
>>
>> CPU 1: CPU 2: CPU 3:
>> takes a spinlock
>> tries to get lock
>> -> xen_qlock_wait()
>> -> xen_clear_irq_pending()
>
> Doesn't the last line above ...
>
>> frees the lock
>> -> xen_qlock_kick(cpu2)
>
> ... need to be below here?
You are right, of course!
Thanks for noticing.
Juergen
In the following situation a vcpu waiting for a lock might not be
woken up from xen_poll_irq():
CPU 1: CPU 2: CPU 3:
takes a spinlock
tries to get lock
-> xen_qlock_wait()
-> xen_clear_irq_pending()
frees the lock
-> xen_qlock_kick(cpu2)
takes lock again
tries to get lock
-> *lock = _Q_SLOW_VAL
-> *lock == _Q_SLOW_VAL ?
-> xen_poll_irq()
frees the lock
-> xen_qlock_kick(cpu3)
And cpu 2 will sleep forever.
This can be avoided easily by modifying xen_qlock_wait() to call
xen_poll_irq() only if the related irq was not pending and to call
xen_clear_irq_pending() only if it was pending.
Cc: stable(a)vger.kernel.org
Cc: Waiman.Long(a)hp.com
Cc: peterz(a)infradead.org
Signed-off-by: Juergen Gross <jgross(a)suse.com>
---
arch/x86/xen/spinlock.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 973f10e05211..cd210a4ba7b1 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -45,17 +45,12 @@ static void xen_qlock_wait(u8 *byte, u8 val)
if (irq == -1)
return;
- /* clear pending */
- xen_clear_irq_pending(irq);
- barrier();
+ /* If irq pending already clear it and return. */
+ if (xen_test_irq_pending(irq)) {
+ xen_clear_irq_pending(irq);
+ return;
+ }
- /*
- * We check the byte value after clearing pending IRQ to make sure
- * that we won't miss a wakeup event because of the clearing.
- *
- * The sync_clear_bit() call in xen_clear_irq_pending() is atomic.
- * So it is effectively a memory barrier for x86.
- */
if (READ_ONCE(*byte) != val)
return;
--
2.16.4
From: Zachary Zhang <zhangzg(a)marvell.com>
commit 91a2968e245d6ba616db37001fa1a043078b1a65 usptream.
The PCIE I/O and MEM resource allocation mechanism is that root bus
goes through the following steps:
1. Check PCI bridges' range and computes I/O and Mem base/limits.
2. Sort all subordinate devices I/O and MEM resource requirements and
allocate the resources and writes/updates subordinate devices'
requirements to PCI bridges I/O and Mem MEM/limits registers.
Currently, PCI Aardvark driver only handles the second step and lacks
the first step, so there is an I/O and MEM resource allocation failure
when using a PCI switch. This commit fixes that by sizing bridges
before doing the resource allocation.
Fixes: 8c39d710363c1 ("PCI: aardvark: Add Aardvark PCI host controller
driver")
Signed-off-by: Zachary Zhang <zhangzg(a)marvell.com>
[Thomas: edit commit log.]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Cc: <stable(a)vger.kernel.org>
---
drivers/pci/host/pci-aardvark.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/host/pci-aardvark.c b/drivers/pci/host/pci-aardvark.c
index d0867a311f42..806fa836b2d6 100644
--- a/drivers/pci/host/pci-aardvark.c
+++ b/drivers/pci/host/pci-aardvark.c
@@ -951,6 +951,7 @@ static int advk_pcie_probe(struct platform_device *pdev)
bus = bridge->bus;
+ pci_bus_size_bridges(bus);
pci_bus_assign_resources(bus);
list_for_each_entry(child, &bus->children, node)
--
2.14.4
From: Martin Willi <martin(a)strongswan.org>
[ Upstream commit c1dc2912059901f97345d9e10c96b841215fdc0f ]
The cluster match requires conntrack for matching packets. If the
netns does not have conntrack hooks registered, the match does not
work at all.
Implicitly load the conntrack hook for the family, exactly as many
other extensions do. This ensures that the match works even if the
hooks have not been registered by other means.
Signed-off-by: Martin Willi <martin(a)strongswan.org>
Acked-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
---
net/netfilter/xt_cluster.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
index 57ef175dfbfa..504d5f730f4e 100644
--- a/net/netfilter/xt_cluster.c
+++ b/net/netfilter/xt_cluster.c
@@ -133,6 +133,7 @@ xt_cluster_mt(const struct sk_buff *skb, struct xt_action_param *par)
static int xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
{
struct xt_cluster_match_info *info = par->matchinfo;
+ int ret;
if (info->total_nodes > XT_CLUSTER_NODES_MAX) {
pr_info("you have exceeded the maximum "
@@ -145,7 +146,17 @@ static int xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
"higher than the total number of nodes\n");
return -EDOM;
}
- return 0;
+
+ ret = nf_ct_netns_get(par->net, par->family);
+ if (ret < 0)
+ pr_info_ratelimited("cannot load conntrack support for proto=%u\n",
+ par->family);
+ return ret;
+}
+
+static void xt_cluster_mt_destroy(const struct xt_mtdtor_param *par)
+{
+ nf_ct_netns_put(par->net, par->family);
}
static struct xt_match xt_cluster_match __read_mostly = {
@@ -154,6 +165,7 @@ static struct xt_match xt_cluster_match __read_mostly = {
.match = xt_cluster_mt,
.checkentry = xt_cluster_mt_checkentry,
.matchsize = sizeof(struct xt_cluster_match_info),
+ .destroy = xt_cluster_mt_destroy,
.me = THIS_MODULE,
};
--
2.17.1
On Sun, Sep 30, 2018 at 05:45:31AM -0700, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> slub: make ->cpu_partial unsigned int
> From e5d9998f3e09359b372a037a6ac55ba235d95d57 Mon Sep 17 00:00:00 2001
> From: Alexey Dobriyan <adobriyan(a)gmail.com>
> Date: Thu, 5 Apr 2018 16:21:10 -0700
> Subject: slub: make ->cpu_partial unsigned int
This doesn't fix any bug that I know of, should not be in -stable.