Few patches were recently marked for stable@ but commits are not
backportable as-is and require a few tweaks. Here is 5.1 stable backport.
[PATCH2 of the series applies as-is, I have it here for completeness]
Jan Kiszka (1):
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Paolo Bonzini (2):
KVM: nVMX: do not use dangling shadow VMCS after guest reset
Revert "kvm: x86: Use task structs fpu field for user"
arch/x86/include/asm/kvm_host.h | 7 ++++---
arch/x86/kvm/vmx/nested.c | 10 +++++++++-
arch/x86/kvm/x86.c | 4 ++--
3 files changed, 15 insertions(+), 6 deletions(-)
--
2.20.1
Few patches were recently marked for stable@ but commits are not
backportable as-is and require a few tweaks. Here is 5.2 stable backport.
[PATCHes 2/3 of the series apply as-is, I have them here for completeness]
Jan Kiszka (1):
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Paolo Bonzini (2):
KVM: nVMX: do not use dangling shadow VMCS after guest reset
Revert "kvm: x86: Use task structs fpu field for user"
arch/x86/include/asm/kvm_host.h | 7 ++++---
arch/x86/kvm/vmx/nested.c | 10 +++++++++-
arch/x86/kvm/x86.c | 4 ++--
3 files changed, 15 insertions(+), 6 deletions(-)
--
2.20.1
From: Russell King <rmk+kernel(a)armlinux.org.uk>
[ Upstream commit ffd9a1ba9fdb7f2bd1d1ad9b9243d34e96756ba2 ]
DMA got broken a while back in two different ways:
1) a change in the behaviour of disable_irq() to wait for the interrupt
to finish executing causes us to deadlock at the end of DMA.
2) a change to avoid modifying the scatterlist left the first transfer
uninitialised.
DMA is only used with expansion cards, so has gone unnoticed.
Fixes: fa4e99899932 ("[ARM] dma: RiscPC: don't modify DMA SG entries")
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mach-rpc/dma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-rpc/dma.c b/arch/arm/mach-rpc/dma.c
index 488d5c3b37f4..799e0b016b62 100644
--- a/arch/arm/mach-rpc/dma.c
+++ b/arch/arm/mach-rpc/dma.c
@@ -128,7 +128,7 @@ static irqreturn_t iomd_dma_handle(int irq, void *dev_id)
} while (1);
idma->state = ~DMA_ST_AB;
- disable_irq(irq);
+ disable_irq_nosync(irq);
return IRQ_HANDLED;
}
@@ -177,6 +177,9 @@ static void iomd_enable_dma(unsigned int chan, dma_t *dma)
DMA_FROM_DEVICE : DMA_TO_DEVICE);
}
+ idma->dma_addr = idma->dma.sg->dma_address;
+ idma->dma_len = idma->dma.sg->length;
+
iomd_writeb(DMA_CR_C, dma_base + CR);
idma->state = DMA_ST_AB;
}
--
2.20.1
From: Russell King <rmk+kernel(a)armlinux.org.uk>
[ Upstream commit ffd9a1ba9fdb7f2bd1d1ad9b9243d34e96756ba2 ]
DMA got broken a while back in two different ways:
1) a change in the behaviour of disable_irq() to wait for the interrupt
to finish executing causes us to deadlock at the end of DMA.
2) a change to avoid modifying the scatterlist left the first transfer
uninitialised.
DMA is only used with expansion cards, so has gone unnoticed.
Fixes: fa4e99899932 ("[ARM] dma: RiscPC: don't modify DMA SG entries")
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mach-rpc/dma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-rpc/dma.c b/arch/arm/mach-rpc/dma.c
index 6d3517dc4772..82aac38fa2cf 100644
--- a/arch/arm/mach-rpc/dma.c
+++ b/arch/arm/mach-rpc/dma.c
@@ -131,7 +131,7 @@ static irqreturn_t iomd_dma_handle(int irq, void *dev_id)
} while (1);
idma->state = ~DMA_ST_AB;
- disable_irq(irq);
+ disable_irq_nosync(irq);
return IRQ_HANDLED;
}
@@ -174,6 +174,9 @@ static void iomd_enable_dma(unsigned int chan, dma_t *dma)
DMA_FROM_DEVICE : DMA_TO_DEVICE);
}
+ idma->dma_addr = idma->dma.sg->dma_address;
+ idma->dma_len = idma->dma.sg->length;
+
iomd_writeb(DMA_CR_C, dma_base + CR);
idma->state = DMA_ST_AB;
}
--
2.20.1
From: Russell King <rmk+kernel(a)armlinux.org.uk>
[ Upstream commit ffd9a1ba9fdb7f2bd1d1ad9b9243d34e96756ba2 ]
DMA got broken a while back in two different ways:
1) a change in the behaviour of disable_irq() to wait for the interrupt
to finish executing causes us to deadlock at the end of DMA.
2) a change to avoid modifying the scatterlist left the first transfer
uninitialised.
DMA is only used with expansion cards, so has gone unnoticed.
Fixes: fa4e99899932 ("[ARM] dma: RiscPC: don't modify DMA SG entries")
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mach-rpc/dma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-rpc/dma.c b/arch/arm/mach-rpc/dma.c
index 6d3517dc4772..82aac38fa2cf 100644
--- a/arch/arm/mach-rpc/dma.c
+++ b/arch/arm/mach-rpc/dma.c
@@ -131,7 +131,7 @@ static irqreturn_t iomd_dma_handle(int irq, void *dev_id)
} while (1);
idma->state = ~DMA_ST_AB;
- disable_irq(irq);
+ disable_irq_nosync(irq);
return IRQ_HANDLED;
}
@@ -174,6 +174,9 @@ static void iomd_enable_dma(unsigned int chan, dma_t *dma)
DMA_FROM_DEVICE : DMA_TO_DEVICE);
}
+ idma->dma_addr = idma->dma.sg->dma_address;
+ idma->dma_len = idma->dma.sg->length;
+
iomd_writeb(DMA_CR_C, dma_base + CR);
idma->state = DMA_ST_AB;
}
--
2.20.1
From: Russell King <rmk+kernel(a)armlinux.org.uk>
[ Upstream commit ffd9a1ba9fdb7f2bd1d1ad9b9243d34e96756ba2 ]
DMA got broken a while back in two different ways:
1) a change in the behaviour of disable_irq() to wait for the interrupt
to finish executing causes us to deadlock at the end of DMA.
2) a change to avoid modifying the scatterlist left the first transfer
uninitialised.
DMA is only used with expansion cards, so has gone unnoticed.
Fixes: fa4e99899932 ("[ARM] dma: RiscPC: don't modify DMA SG entries")
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mach-rpc/dma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-rpc/dma.c b/arch/arm/mach-rpc/dma.c
index fb48f3141fb4..c4c96661eb89 100644
--- a/arch/arm/mach-rpc/dma.c
+++ b/arch/arm/mach-rpc/dma.c
@@ -131,7 +131,7 @@ static irqreturn_t iomd_dma_handle(int irq, void *dev_id)
} while (1);
idma->state = ~DMA_ST_AB;
- disable_irq(irq);
+ disable_irq_nosync(irq);
return IRQ_HANDLED;
}
@@ -174,6 +174,9 @@ static void iomd_enable_dma(unsigned int chan, dma_t *dma)
DMA_FROM_DEVICE : DMA_TO_DEVICE);
}
+ idma->dma_addr = idma->dma.sg->dma_address;
+ idma->dma_len = idma->dma.sg->length;
+
iomd_writeb(DMA_CR_C, dma_base + CR);
idma->state = DMA_ST_AB;
}
--
2.20.1
From: Russell King <rmk+kernel(a)armlinux.org.uk>
[ Upstream commit ffd9a1ba9fdb7f2bd1d1ad9b9243d34e96756ba2 ]
DMA got broken a while back in two different ways:
1) a change in the behaviour of disable_irq() to wait for the interrupt
to finish executing causes us to deadlock at the end of DMA.
2) a change to avoid modifying the scatterlist left the first transfer
uninitialised.
DMA is only used with expansion cards, so has gone unnoticed.
Fixes: fa4e99899932 ("[ARM] dma: RiscPC: don't modify DMA SG entries")
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mach-rpc/dma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-rpc/dma.c b/arch/arm/mach-rpc/dma.c
index fb48f3141fb4..c4c96661eb89 100644
--- a/arch/arm/mach-rpc/dma.c
+++ b/arch/arm/mach-rpc/dma.c
@@ -131,7 +131,7 @@ static irqreturn_t iomd_dma_handle(int irq, void *dev_id)
} while (1);
idma->state = ~DMA_ST_AB;
- disable_irq(irq);
+ disable_irq_nosync(irq);
return IRQ_HANDLED;
}
@@ -174,6 +174,9 @@ static void iomd_enable_dma(unsigned int chan, dma_t *dma)
DMA_FROM_DEVICE : DMA_TO_DEVICE);
}
+ idma->dma_addr = idma->dma.sg->dma_address;
+ idma->dma_len = idma->dma.sg->length;
+
iomd_writeb(DMA_CR_C, dma_base + CR);
idma->state = DMA_ST_AB;
}
--
2.20.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4993e5b37e8bcb55ac90f76eb6d2432647273747 Mon Sep 17 00:00:00 2001
From: Jose Abreu <Jose.Abreu(a)synopsys.com>
Date: Mon, 8 Jul 2019 14:26:28 +0200
Subject: [PATCH] net: stmmac: Re-work the queue selection for TSO packets
Ben Hutchings says:
"This is the wrong place to change the queue mapping.
stmmac_xmit() is called with a specific TX queue locked,
and accessing a different TX queue results in a data race
for all of that queue's state.
I think this commit should be reverted upstream and in all
stable branches. Instead, the driver should implement the
ndo_select_queue operation and override the queue mapping there."
Fixes: c5acdbee22a1 ("net: stmmac: Send TSO packets always from Queue 0")
Suggested-by: Ben Hutchings <ben(a)decadent.org.uk>
Signed-off-by: Jose Abreu <joabreu(a)synopsys.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 06358fe5b245..11b6feb33b54 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3045,17 +3045,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
/* Manage oversized TCP frames for GMAC4 device */
if (skb_is_gso(skb) && priv->tso) {
- if (skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) {
- /*
- * There is no way to determine the number of TSO
- * capable Queues. Let's use always the Queue 0
- * because if TSO is supported then at least this
- * one will be capable.
- */
- skb_set_queue_mapping(skb, 0);
-
+ if (skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))
return stmmac_tso_xmit(skb, dev);
- }
}
if (unlikely(stmmac_tx_avail(priv, queue) < nfrags + 1)) {
@@ -3872,6 +3863,22 @@ static int stmmac_setup_tc(struct net_device *ndev, enum tc_setup_type type,
}
}
+static u16 stmmac_select_queue(struct net_device *dev, struct sk_buff *skb,
+ struct net_device *sb_dev)
+{
+ if (skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) {
+ /*
+ * There is no way to determine the number of TSO
+ * capable Queues. Let's use always the Queue 0
+ * because if TSO is supported then at least this
+ * one will be capable.
+ */
+ return 0;
+ }
+
+ return netdev_pick_tx(dev, skb, NULL) % dev->real_num_tx_queues;
+}
+
static int stmmac_set_mac_address(struct net_device *ndev, void *addr)
{
struct stmmac_priv *priv = netdev_priv(ndev);
@@ -4088,6 +4095,7 @@ static const struct net_device_ops stmmac_netdev_ops = {
.ndo_tx_timeout = stmmac_tx_timeout,
.ndo_do_ioctl = stmmac_ioctl,
.ndo_setup_tc = stmmac_setup_tc,
+ .ndo_select_queue = stmmac_select_queue,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = stmmac_poll_controller,
#endif
When fall-through warnings was enabled by default, commit d93512ef0f0e
("Makefile: Globally enable fall-through warning"), the following
warnings was starting to show up:
../arch/arm64/kernel/hw_breakpoint.c: In function ‘hw_breakpoint_arch_parse’:
../arch/arm64/kernel/hw_breakpoint.c:540:7: warning: this statement may fall
through [-Wimplicit-fallthrough=]
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_1)
^
../arch/arm64/kernel/hw_breakpoint.c:542:3: note: here
case 2:
^~~~
../arch/arm64/kernel/hw_breakpoint.c:544:7: warning: this statement may fall
through [-Wimplicit-fallthrough=]
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_2)
^
../arch/arm64/kernel/hw_breakpoint.c:546:3: note: here
default:
^~~~~~~
Rework so that the compiler doesn't warn about fall-through. Rework so
the code looks like the arm code. Since the comment in the function
indicates taht this is supposed to behave the same way as arm32 because
it handles 32-bit tasks also.
Cc: stable(a)vger.kernel.org # v3.16+
Fixes: 6ee33c2712fc ("ARM: hw_breakpoint: correct and simplify alignment fixup code")
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
arch/arm64/kernel/hw_breakpoint.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index dceb84520948..ea616adf1cf1 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -535,14 +535,17 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
case 0:
/* Aligned */
break;
- case 1:
- /* Allow single byte watchpoint. */
- if (hw->ctrl.len == ARM_BREAKPOINT_LEN_1)
- break;
case 2:
/* Allow halfword watchpoints and breakpoints. */
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_2)
break;
+ /* Fall through */
+ case 1:
+ case 3:
+ /* Allow single byte watchpoint. */
+ if (hw->ctrl.len == ARM_BREAKPOINT_LEN_1)
+ break;
+ /* Fall through */
default:
return -EINVAL;
}
--
2.20.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2e53840362771c73eb0a5ff71611507e64e8eecd Mon Sep 17 00:00:00 2001
From: "Darrick J. Wong" <darrick.wong(a)oracle.com>
Date: Sun, 9 Jun 2019 21:41:41 -0400
Subject: [PATCH] ext4: don't allow any modifications to an immutable file
Don't allow any modifications to a file that's marked immutable, which
means that we have to flush all the writable pages to make the readonly
and we have to check the setattr/setflags parameters more closely.
Signed-off-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index e486e49b31ed..7af835ac8d23 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -269,6 +269,29 @@ static int uuid_is_zero(__u8 u[16])
}
#endif
+/*
+ * If immutable is set and we are not clearing it, we're not allowed to change
+ * anything else in the inode. Don't error out if we're only trying to set
+ * immutable on an immutable file.
+ */
+static int ext4_ioctl_check_immutable(struct inode *inode, __u32 new_projid,
+ unsigned int flags)
+{
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ unsigned int oldflags = ei->i_flags;
+
+ if (!(oldflags & EXT4_IMMUTABLE_FL) || !(flags & EXT4_IMMUTABLE_FL))
+ return 0;
+
+ if ((oldflags & ~EXT4_IMMUTABLE_FL) != (flags & ~EXT4_IMMUTABLE_FL))
+ return -EPERM;
+ if (ext4_has_feature_project(inode->i_sb) &&
+ __kprojid_val(ei->i_projid) != new_projid)
+ return -EPERM;
+
+ return 0;
+}
+
static int ext4_ioctl_setflags(struct inode *inode,
unsigned int flags)
{
@@ -340,6 +363,20 @@ static int ext4_ioctl_setflags(struct inode *inode,
}
}
+ /*
+ * Wait for all pending directio and then flush all the dirty pages
+ * for this file. The flush marks all the pages readonly, so any
+ * subsequent attempt to write to the file (particularly mmap pages)
+ * will come through the filesystem and fail.
+ */
+ if (S_ISREG(inode->i_mode) && !IS_IMMUTABLE(inode) &&
+ (flags & EXT4_IMMUTABLE_FL)) {
+ inode_dio_wait(inode);
+ err = filemap_write_and_wait(inode->i_mapping);
+ if (err)
+ goto flags_out;
+ }
+
handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
@@ -769,7 +806,11 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return err;
inode_lock(inode);
- err = ext4_ioctl_setflags(inode, flags);
+ err = ext4_ioctl_check_immutable(inode,
+ from_kprojid(&init_user_ns, ei->i_projid),
+ flags);
+ if (!err)
+ err = ext4_ioctl_setflags(inode, flags);
inode_unlock(inode);
mnt_drop_write_file(filp);
return err;
@@ -1139,6 +1180,9 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
goto out;
flags = (ei->i_flags & ~EXT4_FL_XFLAG_VISIBLE) |
(flags & EXT4_FL_XFLAG_VISIBLE);
+ err = ext4_ioctl_check_immutable(inode, fa.fsx_projid, flags);
+ if (err)
+ goto out;
err = ext4_ioctl_setflags(inode, flags);
if (err)
goto out;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8a58ddae23796c733c5dfbd717538d89d036c5bd Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Date: Mon, 1 Jul 2019 14:07:55 +0300
Subject: [PATCH] perf/core: Fix exclusive events' grouping
So far, we tried to disallow grouping exclusive events for the fear of
complications they would cause with moving between contexts. Specifically,
moving a software group to a hardware context would violate the exclusivity
rules if both groups contain matching exclusive events.
This attempt was, however, unsuccessful: the check that we have in the
perf_event_open() syscall is both wrong (looks at wrong PMU) and
insufficient (group leader may still be exclusive), as can be illustrated
by running:
$ perf record -e '{intel_pt//,cycles}' uname
$ perf record -e '{cycles,intel_pt//}' uname
ultimately successfully.
Furthermore, we are completely free to trigger the exclusivity violation
by:
perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'
even though the helpful perf record will not allow that, the ABI will.
The warning later in the perf_event_open() path will also not trigger, because
it's also wrong.
Fix all this by validating the original group before moving, getting rid
of broken safeguards and placing a useful one to perf_install_in_context().
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Stephane Eranian <eranian(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vince Weaver <vincent.weaver(a)maine.edu>
Cc: mathieu.poirier(a)linaro.org
Cc: will.deacon(a)arm.com
Fixes: bed5b25ad9c8a ("perf: Add a pmu capability for "exclusive" events")
Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.i…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 16e38c286d46..e8ad3c590a23 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1055,6 +1055,11 @@ static inline int in_software_context(struct perf_event *event)
return event->ctx->pmu->task_ctx_nr == perf_sw_context;
}
+static inline int is_exclusive_pmu(struct pmu *pmu)
+{
+ return pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE;
+}
+
extern struct static_key perf_swevent_enabled[PERF_COUNT_SW_MAX];
extern void ___perf_sw_event(u32, u64, struct pt_regs *, u64);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5dd19bedbf64..eea9d52b010c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2553,6 +2553,9 @@ static int __perf_install_in_context(void *info)
return ret;
}
+static bool exclusive_event_installable(struct perf_event *event,
+ struct perf_event_context *ctx);
+
/*
* Attach a performance event to a context.
*
@@ -2567,6 +2570,8 @@ perf_install_in_context(struct perf_event_context *ctx,
lockdep_assert_held(&ctx->mutex);
+ WARN_ON_ONCE(!exclusive_event_installable(event, ctx));
+
if (event->cpu != -1)
event->cpu = cpu;
@@ -4360,7 +4365,7 @@ static int exclusive_event_init(struct perf_event *event)
{
struct pmu *pmu = event->pmu;
- if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE))
+ if (!is_exclusive_pmu(pmu))
return 0;
/*
@@ -4391,7 +4396,7 @@ static void exclusive_event_destroy(struct perf_event *event)
{
struct pmu *pmu = event->pmu;
- if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE))
+ if (!is_exclusive_pmu(pmu))
return;
/* see comment in exclusive_event_init() */
@@ -4411,14 +4416,15 @@ static bool exclusive_event_match(struct perf_event *e1, struct perf_event *e2)
return false;
}
-/* Called under the same ctx::mutex as perf_install_in_context() */
static bool exclusive_event_installable(struct perf_event *event,
struct perf_event_context *ctx)
{
struct perf_event *iter_event;
struct pmu *pmu = event->pmu;
- if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE))
+ lockdep_assert_held(&ctx->mutex);
+
+ if (!is_exclusive_pmu(pmu))
return true;
list_for_each_entry(iter_event, &ctx->event_list, event_entry) {
@@ -10947,11 +10953,6 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_alloc;
}
- if ((pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE) && group_leader) {
- err = -EBUSY;
- goto err_context;
- }
-
/*
* Look up the group leader (we will attach this event to it):
*/
@@ -11039,6 +11040,18 @@ SYSCALL_DEFINE5(perf_event_open,
move_group = 0;
}
}
+
+ /*
+ * Failure to create exclusive events returns -EBUSY.
+ */
+ err = -EBUSY;
+ if (!exclusive_event_installable(group_leader, ctx))
+ goto err_locked;
+
+ for_each_sibling_event(sibling, group_leader) {
+ if (!exclusive_event_installable(sibling, ctx))
+ goto err_locked;
+ }
} else {
mutex_lock(&ctx->mutex);
}
@@ -11075,9 +11088,6 @@ SYSCALL_DEFINE5(perf_event_open,
* because we need to serialize with concurrent event creation.
*/
if (!exclusive_event_installable(event, ctx)) {
- /* exclusive and group stuff are assumed mutually exclusive */
- WARN_ON_ONCE(move_group);
-
err = -EBUSY;
goto err_locked;
}
When fall-through warnings was enabled by default, d93512ef0f0e
("Makefile: Globally enable fall-through warning"), we could see the
following warnings was starting to show up. However, this was originally
introduced in commit 6ee33c2712fc ("ARM: hw_breakpoint: correct and
simplify alignment fixup code"). Commit d968d2b801d8 ("ARM: 7497/1:
hw_breakpoint: allow single-byte watchpoints on all addresses") was
written with the intent to allow single-byte watchpoints on all
addresses but forgot to move 'case 1:' down below 'case 2:'.
../arch/arm/kernel/hw_breakpoint.c: In function ‘hw_breakpoint_arch_parse’:
../arch/arm/kernel/hw_breakpoint.c:609:7: warning: this statement may fall
through [-Wimplicit-fallthrough=]
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_2)
^
../arch/arm/kernel/hw_breakpoint.c:611:3: note: here
case 3:
^~~~
../arch/arm/kernel/hw_breakpoint.c:613:7: warning: this statement may fall
through [-Wimplicit-fallthrough=]
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_1)
^
../arch/arm/kernel/hw_breakpoint.c:615:3: note: here
default:
^~~~~~~
Rework so 'case 1:' are next to 'case 3:' and also add '/* Fall through
*/' so that the compiler doesn't warn about fall-through.
Cc: stable(a)vger.kernel.org # v3.16
Fixes: 6ee33c2712fc ("ARM: hw_breakpoint: correct and simplify alignment fixup code")
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
arch/arm/kernel/hw_breakpoint.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index af8b8e15f589..c14d506969ba 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -603,15 +603,17 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
case 0:
/* Aligned */
break;
- case 1:
case 2:
/* Allow halfword watchpoints and breakpoints. */
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_2)
break;
+ /* Fall through */
+ case 1:
case 3:
/* Allow single byte watchpoint. */
if (hw->ctrl.len == ARM_BREAKPOINT_LEN_1)
break;
+ /* Fall through */
default:
ret = -EINVAL;
goto out;
--
2.20.1
The patch below was submitted to be applied to the 5.2-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 19ec11a2233d24a7811836fa735203aaccf95a23 Mon Sep 17 00:00:00 2001
From: Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
Date: Thu, 11 Jul 2019 10:29:35 +0200
Subject: [PATCH] gpio: em: remove the gpiochip before removing the irq domain
In commit 8764c4ca5049 ("gpio: em: use the managed version of
gpiochip_add_data()") we implicitly altered the ordering of resource
freeing: since gpiochip_remove() calls gpiochip_irqchip_remove()
internally, we now can potentially use the irq_domain after it was
destroyed in the remove() callback (as devm resources are freed after
remove() has returned).
Use devm_add_action_or_reset() to keep the ordering right and entirely
kill the remove() callback in the driver.
Reported-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Fixes: 8764c4ca5049 ("gpio: em: use the managed version of gpiochip_add_data()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
diff --git a/drivers/gpio/gpio-em.c b/drivers/gpio/gpio-em.c
index b6af705a4e5f..a87951293aaa 100644
--- a/drivers/gpio/gpio-em.c
+++ b/drivers/gpio/gpio-em.c
@@ -259,6 +259,13 @@ static const struct irq_domain_ops em_gio_irq_domain_ops = {
.xlate = irq_domain_xlate_twocell,
};
+static void em_gio_irq_domain_remove(void *data)
+{
+ struct irq_domain *domain = data;
+
+ irq_domain_remove(domain);
+}
+
static int em_gio_probe(struct platform_device *pdev)
{
struct em_gio_priv *p;
@@ -333,39 +340,30 @@ static int em_gio_probe(struct platform_device *pdev)
return -ENXIO;
}
+ ret = devm_add_action_or_reset(&pdev->dev, em_gio_irq_domain_remove,
+ p->irq_domain);
+ if (ret)
+ return ret;
+
if (devm_request_irq(&pdev->dev, irq[0]->start,
em_gio_irq_handler, 0, name, p)) {
dev_err(&pdev->dev, "failed to request low IRQ\n");
- ret = -ENOENT;
- goto err1;
+ return -ENOENT;
}
if (devm_request_irq(&pdev->dev, irq[1]->start,
em_gio_irq_handler, 0, name, p)) {
dev_err(&pdev->dev, "failed to request high IRQ\n");
- ret = -ENOENT;
- goto err1;
+ return -ENOENT;
}
ret = devm_gpiochip_add_data(&pdev->dev, gpio_chip, p);
if (ret) {
dev_err(&pdev->dev, "failed to add GPIO controller\n");
- goto err1;
+ return ret;
}
return 0;
-
-err1:
- irq_domain_remove(p->irq_domain);
- return ret;
-}
-
-static int em_gio_remove(struct platform_device *pdev)
-{
- struct em_gio_priv *p = platform_get_drvdata(pdev);
-
- irq_domain_remove(p->irq_domain);
- return 0;
}
static const struct of_device_id em_gio_dt_ids[] = {
@@ -376,7 +374,6 @@ MODULE_DEVICE_TABLE(of, em_gio_dt_ids);
static struct platform_driver em_gio_device_driver = {
.probe = em_gio_probe,
- .remove = em_gio_remove,
.driver = {
.name = "em_gio",
.of_match_table = em_gio_dt_ids,
When fall-through warnings was enabled by default, commit d93512ef0f0e
("Makefile: Globally enable fall-through warning"), the following
warning was starting to show up:
../drivers/iommu/arm-smmu-v3.c: In function ‘arm_smmu_write_strtab_ent’:
../drivers/iommu/arm-smmu-v3.c:1189:7: warning: this statement may fall
through [-Wimplicit-fallthrough=]
if (disable_bypass)
^
../drivers/iommu/arm-smmu-v3.c:1191:3: note: here
default:
^~~~~~~
Rework so that the compiler doesn't warn about fall-through. Make it
clearer by calling 'BUG()' when disable_bypass is set, and always
'break;'
Cc: stable(a)vger.kernel.org # v4.2+
Fixes: 5bc0a11664e1 ("iommu/arm-smmu: Don't BUG() if we find aborting STEs with disable_bypass")
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
drivers/iommu/arm-smmu-v3.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a9a9fabd3968..8e5f0565996d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1186,8 +1186,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
ste_live = true;
break;
case STRTAB_STE_0_CFG_ABORT:
- if (disable_bypass)
- break;
+ if (!disable_bypass)
+ BUG();
+ break;
default:
BUG(); /* STE corruption */
}
--
2.20.1
While I had thought I had fixed this issue in:
commit 342406e4fbba ("drm/nouveau/i2c: Disable i2c bus access after
->fini()")
It turns out that while I did fix the error messages I was seeing on my
P50 when trying to access i2c busses with the GPU in runtime suspend, I
accidentally had missed one important detail that was mentioned on the
bug report this commit was supposed to fix: that the CPU would only lock
up when trying to access i2c busses _on connected devices_ _while the
GPU is not in runtime suspend_. Whoops. That definitely explains why I
was not able to get my machine to hang with i2c bus interactions until
now, as plugging my P50 into it's dock with an HDMI monitor connected
allowed me to finally reproduce this locally.
Now that I have managed to reproduce this issue properly, it looks like
the problem is much simpler then it looks. It turns out that some
connected devices, such as MST laptop docks, will actually ACK i2c reads
even if no data was actually read:
[ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1
[ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000
[ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001
[ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
Because we don't handle the situation of i2c ack without any data, we
end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the
value of cnt always remains at 0. This finally properly explains how
this could result in a CPU hang like the ones observed in the
aforementioned commit.
So, fix this by retrying transactions if no data is written or received,
and give up and fail the transaction if we continue to not write or
receive any data after 32 retries.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 24 +++++++++++++------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
index b4e7404fe660..a11637b0f6cc 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c
@@ -40,8 +40,7 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
u8 *ptr = msg->buf;
while (remaining) {
- u8 cnt = (remaining > 16) ? 16 : remaining;
- u8 cmd;
+ u8 cnt, retries, cmd;
if (msg->flags & I2C_M_RD)
cmd = 1;
@@ -51,10 +50,19 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
if (mcnt || remaining > 16)
cmd |= 4; /* MOT */
- ret = aux->func->xfer(aux, true, cmd, msg->addr, ptr, &cnt);
- if (ret < 0) {
- nvkm_i2c_aux_release(aux);
- return ret;
+ for (retries = 0, cnt = 0;
+ retries < 32 && !cnt;
+ retries++) {
+ cnt = min_t(u8, remaining, 16);
+ ret = aux->func->xfer(aux, true, cmd,
+ msg->addr, ptr, &cnt);
+ if (ret < 0)
+ goto out;
+ }
+ if (!cnt) {
+ AUX_TRACE(aux, "no data after 32 retries");
+ ret = -EIO;
+ goto out;
}
ptr += cnt;
@@ -64,8 +72,10 @@ nvkm_i2c_aux_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
msg++;
}
+ ret = num;
+out:
nvkm_i2c_aux_release(aux);
- return num;
+ return ret;
}
static u32
--
2.21.0
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a3bf9fbdad600b1e4335dd90979f8d6072e4f602 Mon Sep 17 00:00:00 2001
From: Greg Kurz <groug(a)kaod.org>
Date: Wed, 15 May 2019 12:05:01 +0200
Subject: [PATCH] powerpc/pseries: Fix xive=off command line
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On POWER9, if the hypervisor supports XIVE exploitation mode, the
guest OS will unconditionally requests for the XIVE interrupt mode
even if XIVE was deactivated with the kernel command line xive=off.
Later on, when the spapr XIVE init code handles xive=off, it disables
XIVE and tries to fall back on the legacy mode XICS.
This discrepency causes a kernel panic because the hypervisor is
configured to provide the XIVE interrupt mode to the guest :
kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
...
NIP xics_smp_probe+0x38/0x98
LR xics_smp_probe+0x2c/0x98
Call Trace:
xics_smp_probe+0x2c/0x98 (unreliable)
pSeries_smp_probe+0x40/0xa0
smp_prepare_cpus+0x62c/0x6ec
kernel_init_freeable+0x148/0x448
kernel_init+0x2c/0x148
ret_from_kernel_thread+0x5c/0x68
Look for xive=off during prom_init and don't ask for XIVE in this
case. One exception though: if the host only supports XIVE, we still
want to boot so we ignore xive=off.
Similarly, have the spapr XIVE init code to looking at the interrupt
mode negotiated during CAS, and ignore xive=off if the hypervisor only
supports XIVE.
Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Cc: stable(a)vger.kernel.org # v4.20
Reported-by: Pavithra R. Prakash <pavrampu(a)in.ibm.com>
Signed-off-by: Greg Kurz <groug(a)kaod.org>
Reviewed-by: Cédric Le Goater <clg(a)kaod.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index bab79c51ba4f..17f1ae7fae2c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -172,6 +172,7 @@ static unsigned long __prombss prom_tce_alloc_end;
#ifdef CONFIG_PPC_PSERIES
static bool __prombss prom_radix_disable;
+static bool __prombss prom_xive_disable;
#endif
struct platform_support {
@@ -808,6 +809,12 @@ static void __init early_cmdline_parse(void)
}
if (prom_radix_disable)
prom_debug("Radix disabled from cmdline\n");
+
+ opt = prom_strstr(prom_cmd_line, "xive=off");
+ if (opt) {
+ prom_xive_disable = true;
+ prom_debug("XIVE disabled from cmdline\n");
+ }
#endif /* CONFIG_PPC_PSERIES */
}
@@ -1216,10 +1223,17 @@ static void __init prom_parse_xive_model(u8 val,
switch (val) {
case OV5_FEAT(OV5_XIVE_EITHER): /* Either Available */
prom_debug("XIVE - either mode supported\n");
- support->xive = true;
+ support->xive = !prom_xive_disable;
break;
case OV5_FEAT(OV5_XIVE_EXPLOIT): /* Only Exploitation mode */
prom_debug("XIVE - exploitation mode supported\n");
+ if (prom_xive_disable) {
+ /*
+ * If we __have__ to do XIVE, we're better off ignoring
+ * the command line rather than not booting.
+ */
+ prom_printf("WARNING: Ignoring cmdline option xive=off\n");
+ }
support->xive = true;
break;
case OV5_FEAT(OV5_XIVE_LEGACY): /* Only Legacy mode */
diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index 575db3b06a6b..2e2d1b8f810f 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -20,6 +20,7 @@
#include <linux/cpumask.h>
#include <linux/mm.h>
#include <linux/delay.h>
+#include <linux/libfdt.h>
#include <asm/prom.h>
#include <asm/io.h>
@@ -663,6 +664,55 @@ static bool xive_get_max_prio(u8 *max_prio)
return true;
}
+static const u8 *get_vec5_feature(unsigned int index)
+{
+ unsigned long root, chosen;
+ int size;
+ const u8 *vec5;
+
+ root = of_get_flat_dt_root();
+ chosen = of_get_flat_dt_subnode_by_name(root, "chosen");
+ if (chosen == -FDT_ERR_NOTFOUND)
+ return NULL;
+
+ vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size);
+ if (!vec5)
+ return NULL;
+
+ if (size <= index)
+ return NULL;
+
+ return vec5 + index;
+}
+
+static bool xive_spapr_disabled(void)
+{
+ const u8 *vec5_xive;
+
+ vec5_xive = get_vec5_feature(OV5_INDX(OV5_XIVE_SUPPORT));
+ if (vec5_xive) {
+ u8 val;
+
+ val = *vec5_xive & OV5_FEAT(OV5_XIVE_SUPPORT);
+ switch (val) {
+ case OV5_FEAT(OV5_XIVE_EITHER):
+ case OV5_FEAT(OV5_XIVE_LEGACY):
+ break;
+ case OV5_FEAT(OV5_XIVE_EXPLOIT):
+ /* Hypervisor only supports XIVE */
+ if (xive_cmdline_disabled)
+ pr_warn("WARNING: Ignoring cmdline option xive=off\n");
+ return false;
+ default:
+ pr_warn("%s: Unknown xive support option: 0x%x\n",
+ __func__, val);
+ break;
+ }
+ }
+
+ return xive_cmdline_disabled;
+}
+
bool __init xive_spapr_init(void)
{
struct device_node *np;
@@ -675,7 +725,7 @@ bool __init xive_spapr_init(void)
const __be32 *reg;
int i;
- if (xive_cmdline_disabled)
+ if (xive_spapr_disabled())
return false;
pr_devel("%s()\n", __func__);
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: hantro: Set DMA max segment size
Author: Francois Buergisser <fbuergisser(a)chromium.org>
Date: Thu Jul 25 10:17:50 2019 -0400
The Hantro codec is typically used in platforms with an IOMMU,
so we need to set a proper DMA segment size. Devices without an
IOMMU will still fallback to default 64KiB segments.
Cc: stable(a)vger.kernel.org
Fixes: 775fec69008d3 ("media: add Rockchip VPU JPEG encoder driver")
Signed-off-by: Francois Buergisser <fbuergisser(a)chromium.org>
Signed-off-by: Ezequiel Garcia <ezequiel(a)collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
drivers/staging/media/hantro/hantro_drv.c | 1 +
1 file changed, 1 insertion(+)
---
diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index b71a06e9159e..4eae1dbb1ac8 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -731,6 +731,7 @@ static int hantro_probe(struct platform_device *pdev)
dev_err(vpu->dev, "Could not set DMA coherent mask.\n");
return ret;
}
+ vb2_dma_contig_set_max_seg_size(&pdev->dev, DMA_BIT_MASK(32));
for (i = 0; i < vpu->variant->num_irqs; i++) {
const char *irq_name = vpu->variant->irqs[i].name;
From: Francois Buergisser <fbuergisser(a)chromium.org>
The Hantro codec is typically used in platforms with an IOMMU,
so we need to set a proper DMA segment size. Devices without an
IOMMU will still fallback to default 64KiB segments.
Cc: stable(a)vger.kernel.org
Fixes: 775fec69008d3 ("media: add Rockchip VPU JPEG encoder driver")
Signed-off-by: Francois Buergisser <fbuergisser(a)chromium.org>
Signed-off-by: Ezequiel Garcia <ezequiel(a)collabora.com>
---
drivers/staging/media/hantro/hantro_drv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index b71a06e9159e..4eae1dbb1ac8 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -731,6 +731,7 @@ static int hantro_probe(struct platform_device *pdev)
dev_err(vpu->dev, "Could not set DMA coherent mask.\n");
return ret;
}
+ vb2_dma_contig_set_max_seg_size(&pdev->dev, DMA_BIT_MASK(32));
for (i = 0; i < vpu->variant->num_irqs; i++) {
const char *irq_name = vpu->variant->irqs[i].name;
--
2.22.0
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
I don't know.
Maintainer did not respond, nor to original send nor to resend.
This is a note to let you know that I've just added the patch titled
hpet: Fix division by zero in hpet_time_div()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 0c7d37f4d9b8446956e97b7c5e61173cdb7c8522 Mon Sep 17 00:00:00 2001
From: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Date: Thu, 11 Jul 2019 21:27:57 +0800
Subject: hpet: Fix division by zero in hpet_time_div()
The base value in do_div() called by hpet_time_div() is truncated from
unsigned long to uint32_t, resulting in a divide-by-zero exception.
UBSAN: Undefined behaviour in ../drivers/char/hpet.c:572:2
division by zero
CPU: 1 PID: 23682 Comm: syz-executor.3 Not tainted 4.4.184.x86_64+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
0000000000000000 b573382df1853d00 ffff8800a3287b98 ffffffff81ad7561
ffff8800a3287c00 ffffffff838b35b0 ffffffff838b3860 ffff8800a3287c20
0000000000000000 ffff8800a3287bb0 ffffffff81b8f25e ffffffff838b35a0
Call Trace:
[<ffffffff81ad7561>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff81ad7561>] dump_stack+0xc1/0x120 lib/dump_stack.c:51
[<ffffffff81b8f25e>] ubsan_epilogue+0x12/0x8d lib/ubsan.c:166
[<ffffffff81b900cb>] __ubsan_handle_divrem_overflow+0x282/0x2c8 lib/ubsan.c:262
[<ffffffff823560dd>] hpet_time_div drivers/char/hpet.c:572 [inline]
[<ffffffff823560dd>] hpet_ioctl_common drivers/char/hpet.c:663 [inline]
[<ffffffff823560dd>] hpet_ioctl_common.cold+0xa8/0xad drivers/char/hpet.c:577
[<ffffffff81e63d56>] hpet_ioctl+0xc6/0x180 drivers/char/hpet.c:676
[<ffffffff81711590>] vfs_ioctl fs/ioctl.c:43 [inline]
[<ffffffff81711590>] file_ioctl fs/ioctl.c:470 [inline]
[<ffffffff81711590>] do_vfs_ioctl+0x6e0/0xf70 fs/ioctl.c:605
[<ffffffff81711eb4>] SYSC_ioctl fs/ioctl.c:622 [inline]
[<ffffffff81711eb4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:613
[<ffffffff82846003>] tracesys_phase2+0x90/0x95
The main C reproducer autogenerated by syzkaller,
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
memcpy((void*)0x20000100, "/dev/hpet\000", 10);
syscall(__NR_openat, 0xffffffffffffff9c, 0x20000100, 0, 0);
syscall(__NR_ioctl, r[0], 0x40086806, 0x40000000000000);
Fix it by using div64_ul().
Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Signed-off-by: Zhang HongJun <zhanghongjun2(a)huawei.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Arnd Bergmann <arnd(a)arndb.de>
Link: https://lore.kernel.org/r/20190711132757.130092-1-wangkefeng.wang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/char/hpet.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 5c39f20378b8..9ac6671bb514 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -567,8 +567,7 @@ static inline unsigned long hpet_time_div(struct hpets *hpets,
unsigned long long m;
m = hpets->hp_tick_freq + (dis >> 1);
- do_div(m, dis);
- return (unsigned long)m;
+ return div64_ul(m, dis);
}
static int
--
2.22.0
In Resize BAR control register, bits[8:12] represents size of BAR.
As per PCIe specification, below is encoded values in register bits
to actual BAR size table:
Bits BAR size
0 1 MB
1 2 MB
2 4 MB
3 8 MB
--
For 1 MB BAR size, BAR size bits should be set to 0 but incorrectly
these bits are set to "1f".
Latest megaraid_sas and mpt3sas adapters which support Resizable BAR
with 1 MB BAR size fails to initialize during system resume from S3 sleep.
Fix: Correctly set BAR size bits to "0" for 1MB BAR size.
CC: stable(a)vger.kernel.org # v4.16+
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203939
Fixes: d3252ace0bc652a1a244455556b6a549f969bf99 ("PCI: Restore resized BAR state on resume")
Signed-off-by: Sumit Saxena <sumit.saxena(a)broadcom.com>
---
drivers/pci/pci.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843..b651f32 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1417,12 +1417,13 @@ static void pci_restore_rebar_state(struct pci_dev *pdev)
for (i = 0; i < nbars; i++, pos += 8) {
struct resource *res;
- int bar_idx, size;
+ int bar_idx, size, order;
pci_read_config_dword(pdev, pos + PCI_REBAR_CTRL, &ctrl);
bar_idx = ctrl & PCI_REBAR_CTRL_BAR_IDX;
res = pdev->resource + bar_idx;
- size = order_base_2((resource_size(res) >> 20) | 1) - 1;
+ order = order_base_2((resource_size(res) >> 20) | 1);
+ size = order ? order - 1 : 0;
ctrl &= ~PCI_REBAR_CTRL_BAR_SIZE;
ctrl |= size << PCI_REBAR_CTRL_BAR_SHIFT;
pci_write_config_dword(pdev, pos + PCI_REBAR_CTRL, ctrl);
--
1.8.3.1
This is a note to let you know that I've just added the patch titled
staging: android: ion: Bail out upon SIGKILL when allocating memory.
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 8f9e86ee795971eabbf372e6d804d6b8578287a7 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Mon, 1 Jul 2019 19:55:19 +0900
Subject: staging: android: ion: Bail out upon SIGKILL when allocating memory.
syzbot found that a thread can stall for minutes inside
ion_system_heap_allocate() after that thread was killed by SIGKILL [1].
Let's check for SIGKILL before doing memory allocation.
[1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f…
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: stable <stable(a)vger.kernel.org>
Reported-by: syzbot <syzbot+8ab2d0f39fb79fe6ca40(a)syzkaller.appspotmail.com>
Acked-by: Laura Abbott <labbott(a)redhat.com>
Acked-by: Sumit Semwal <sumit.semwal(a)linaro.org>
Link: https://lore.kernel.org/r/d088f188-5f32-d8fc-b9a0-0b404f7501cc@I-love.SAKUR…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/android/ion/ion_page_pool.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c
index fd4995fb676e..f85ec5b16b65 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -8,11 +8,14 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/swap.h>
+#include <linux/sched/signal.h>
#include "ion.h"
static inline struct page *ion_page_pool_alloc_pages(struct ion_page_pool *pool)
{
+ if (fatal_signal_pending(current))
+ return NULL;
return alloc_pages(pool->gfp_mask, pool->order);
}
--
2.22.0
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: vivid: fix device init when no_error_inj=1 and fb disabled
Author: Guillaume Tucker <guillaume.tucker(a)collabora.com>
Date: Wed Jul 24 11:19:22 2019 -0400
Add an extra condition to add the video output control class when the
device has some hdmi outputs defined. This is required to then always
be able to add the display present control, which is enabled when
there are some hdmi outputs.
This fixes the corner case where no_error_inj is enabled and the
device has no frame buffer but some hdmi outputs, as otherwise the
video output control class would be added anyway. Without this fix,
the sanity checks fail in v4l2_ctrl_new() as name is NULL.
Fixes: c533435ffb91 ("media: vivid: add display present control")
Cc: stable(a)vger.kernel.org # for 5.3
Signed-off-by: Guillaume Tucker <guillaume.tucker(a)collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
drivers/media/platform/vivid/vivid-ctrls.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
diff --git a/drivers/media/platform/vivid/vivid-ctrls.c b/drivers/media/platform/vivid/vivid-ctrls.c
index fb9220e4e640..cb19a9a73092 100644
--- a/drivers/media/platform/vivid/vivid-ctrls.c
+++ b/drivers/media/platform/vivid/vivid-ctrls.c
@@ -1473,7 +1473,7 @@ int vivid_create_controls(struct vivid_dev *dev, bool show_ccs_cap,
v4l2_ctrl_handler_init(hdl_vid_cap, 55);
v4l2_ctrl_new_custom(hdl_vid_cap, &vivid_ctrl_class, NULL);
v4l2_ctrl_handler_init(hdl_vid_out, 26);
- if (!no_error_inj || dev->has_fb)
+ if (!no_error_inj || dev->has_fb || dev->num_hdmi_outputs)
v4l2_ctrl_new_custom(hdl_vid_out, &vivid_ctrl_class, NULL);
v4l2_ctrl_handler_init(hdl_vbi_cap, 21);
v4l2_ctrl_new_custom(hdl_vbi_cap, &vivid_ctrl_class, NULL);
Hello,
First, thank you for maintaining the stable branches!
When testing our MPTCP out-of-tree kernel[1] with KASAN last week-end,
we saw some new warnings, always the same trace and looking like that:
[ 16.464577] ==================================================================
[ 16.465448] BUG: KASAN: slab-out-of-bounds in strscpy+0x49d/0x590
[ 16.466171] Read of size 8 at addr ffff88803525f788 by task confd/330
[ 16.467114]
[ 16.467313] CPU: 0 PID: 330 Comm: confd Not tainted 4.14.133-mptcp+ #2
[ 16.468071] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[ 16.469016] Call Trace:
[ 16.469318] dump_stack+0xa6/0x12e
[ 16.469721] ? _atomic_dec_and_lock+0x1b2/0x1b2
[ 16.470255] ? radix_tree_lookup+0x10/0x10
[ 16.470764] ? strscpy+0x49d/0x590
[ 16.471299] print_address_description+0xa1/0x330
[ 16.471918] ? strscpy+0x49d/0x590
[ 16.472321] kasan_report+0x23f/0x350
[ 16.472751] strscpy+0x49d/0x590
[ 16.473135] ? strncpy+0xd0/0xd0
[ 16.473518] p9dirent_read+0x26b/0x510
[ 16.473977] ? unwind_next_frame+0xc97/0x1eb0
[ 16.474481] ? p9stat_read+0x440/0x440
[ 16.474945] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 16.475543] ? rcutorture_record_progress+0x10/0x10
[ 16.476123] ? kernel_text_address+0x111/0x120
[ 16.476656] ? __kernel_text_address+0xe/0x30
[ 16.477273] v9fs_dir_readdir_dotl+0x340/0x5b0
[ 16.477900] ? kasan_slab_free+0x12d/0x1a0
[ 16.478377] ? v9fs_dir_readdir+0x810/0x810
[ 16.478887] ? new_slab+0x29f/0x3b0
[ 16.479298] ? iterate_fd+0x300/0x300
[ 16.479728] ? do_filp_open+0x24a/0x3b0
[ 16.480177] ? SyS_getcwd+0x3b7/0x9f0
[ 16.480626] ? may_open_dev+0xc0/0xc0
[ 16.481056] ? get_unused_fd_flags+0x180/0x180
[ 16.481643] ? __up.isra.0+0x230/0x230
[ 16.482195] ? __fdget_pos+0x105/0x170
[ 16.482658] ? iterate_dir+0x171/0x5b0
[ 16.483097] iterate_dir+0x171/0x5b0
[ 16.483518] SyS_getdents+0x1dc/0x3a0
[ 16.483968] ? SyS_old_readdir+0x200/0x200
[ 16.484444] ? SyS_write+0x1c0/0x270
[ 16.484875] ? fillonedir+0x1a0/0x1a0
[ 16.485315] ? SyS_old_readdir+0x200/0x200
[ 16.485791] ? do_syscall_64+0x259/0xa90
[ 16.486258] do_syscall_64+0x259/0xa90
[ 16.486715] ? syscall_return_slowpath+0x340/0x340
[ 16.487320] ? do_page_fault+0x11f/0x400
[ 16.487849] ? __do_page_fault+0xe00/0xe00
[ 16.488305] ? __hrtick_start+0x2f0/0x2f0
[ 16.488752] ? __switch_to_asm+0x31/0x60
[ 16.489189] ? __switch_to_asm+0x31/0x60
[ 16.489626] ? __switch_to_asm+0x25/0x60
[ 16.490063] ? __switch_to_asm+0x31/0x60
[ 16.490500] ? __switch_to_asm+0x31/0x60
[ 16.490940] ? __switch_to_asm+0x31/0x60
[ 16.491377] ? __switch_to_asm+0x25/0x60
[ 16.491820] ? __switch_to_asm+0x31/0x60
[ 16.492305] ? __switch_to_asm+0x31/0x60
[ 16.492769] ? __switch_to_asm+0x31/0x60
[ 16.493306] ? __switch_to_asm+0x31/0x60
[ 16.493917] ? __switch_to_asm+0x31/0x60
[ 16.494402] ? __switch_to_asm+0x25/0x60
[ 16.494859] ? __switch_to_asm+0x31/0x60
[ 16.495344] ? __switch_to_asm+0x31/0x60
[ 16.495798] ? __switch_to_asm+0x31/0x60
[ 16.496283] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 16.496885] RIP: 0033:0x7f0bd5b26855
[ 16.497313] RSP: 002b:00007f0bd69d4d60 EFLAGS: 00000246 ORIG_RAX: 000000000000004e
[ 16.498387] RAX: ffffffffffffffda RBX: 00007f0bb800a910 RCX: 00007f0bd5b26855
[ 16.499221] RDX: 0000000000008000 RSI: 00007f0bb800a910 RDI: 000000000000002c
[ 16.500102] RBP: 00007f0bb800a910 R08: 00007f0bd69d4e10 R09: 0000000000008030
[ 16.500942] R10: 0000000000000076 R11: 0000000000000246 R12: ffffffffffffff70
[ 16.501780] R13: 0000000000000002 R14: 00007f0bb80008d0 R15: 000000000129cb44
[ 16.502641]
[ 16.502819] Allocated by task 330:
[ 16.503230] kasan_kmalloc+0xe4/0x170
[ 16.503799] __kmalloc+0xdd/0x1c0
[ 16.504259] p9pdu_readf+0xbb8/0x2940
[ 16.504707] p9dirent_read+0x174/0x510
[ 16.505154] v9fs_dir_readdir_dotl+0x340/0x5b0
[ 16.505694] iterate_dir+0x171/0x5b0
[ 16.506122] SyS_getdents+0x1dc/0x3a0
[ 16.506573] do_syscall_64+0x259/0xa90
[ 16.507031] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 16.507633]
[ 16.507804] Freed by task 322:
[ 16.508178] kasan_slab_free+0xac/0x1a0
[ 16.508741] kfree+0xcd/0x1e0
[ 16.509194] p9stat_free+0x32/0x200
[ 16.509633] v9fs_vfs_get_link+0x173/0x230
[ 16.510118] ovl_get_link+0x52/0x80
[ 16.510538] trailing_symlink+0x42c/0x5f0
[ 16.511034] path_lookupat+0x1b4/0xc30
[ 16.511481] filename_lookup+0x237/0x470
[ 16.511955] vfs_statx+0xb0/0x120
[ 16.512358] SyS_newstat+0x70/0xc0
[ 16.512759] do_syscall_64+0x259/0xa90
[ 16.513205] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 16.513807]
[ 16.514041] The buggy address belongs to the object at ffff88803525f780
[ 16.514041] which belongs to the cache kmalloc-16 of size 16
[ 16.515645] The buggy address is located 8 bytes inside of
[ 16.515645] 16-byte region [ffff88803525f780, ffff88803525f790)
[ 16.516997] The buggy address belongs to the page:
[ 16.517563] page:ffff88803f5407c0 count:1 mapcount:0 mapping: (null) index:0x0
[ 16.518504] flags: 0xc80000000100(slab)
[ 16.519103] raw: 0000c80000000100 0000000000000000 0000000000000000 0000000180800080
[ 16.520066] raw: ffff88803f4fa900 0000000800000008 ffff888035c01b40 0000000000000000
[ 16.520981] page dumped because: kasan: bad access detected
[ 16.521647]
[ 16.521818] Memory state around the buggy address:
[ 16.522413] ffff88803525f680: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
[ 16.523258] ffff88803525f700: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
[ 16.524118] >ffff88803525f780: 00 02 fc fc fb fb fc fc fb fb fc fc fb fb fc fc
[ 16.525093] ^
[ 16.525591] ffff88803525f800: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
[ 16.526471] ffff88803525f880: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc
[ 16.527323] ==================================================================
We are running tests in different KVMs and using 9P for the root RO
partition and others for a shared file system between VMs and the host:
mount -t 9p outshare "${MOUNT_DIR}/overlay/out" -o trans=virtio,version=9p2000.L,access=0,rw
Our out-of-tree kernel does not modify the FS part, nor net/9p.
With Dominique from v9fs project, we analysed the issue[2]. At the end,
it is confirmed that this KASAN warning is not related to our MPTCP
modifications but due to a recent change in the v4.14 stable branch:
84693d060965 (9p: p9dirent_read: check network-provided name length).
In this change backported by Sasha, strcpy() has been replaced by
strscpy(). This is known to cause KASAN false-positives, see:
1a3241ff10d0 (lib/strscpy: Shut up KASAN false-positives in strscpy()).
Note that this commit depends on the two parent ones.
Could it then be possible to also backport these three commits please?
The three commits apply without any issues. I followed the documention
to propose these three commits to stable, the Option 3.
Just for me for next time: is it easier for you to propose the patches
like I did or to only mention the SHA from Linus GIT tree?
- 1a3241ff10d0 (lib/strscpy: Shut up KASAN false-positives in strscpy())
- 7f1e541fc8d5 (compiler.h: Add read_word_at_a_time() function.)
- bdb5ac801af3 (compiler.h, kasan: Avoid duplicating __read_once_size_nocheck())
Cheers,
Matt
[1] https://github.com/multipath-tcp/mptcp/
[2] https://sourceforge.net/p/v9fs/mailman/message/36718122/
Andrey Ryabinin (3):
compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
compiler.h: Add read_word_at_a_time() function.
lib/strscpy: Shut up KASAN false-positives in strscpy()
include/linux/compiler.h | 22 ++++++++++++++--------
lib/string.c | 2 +-
2 files changed, 15 insertions(+), 9 deletions(-)
Cc: Dominique Martinet <asmadeus(a)codewreck.org>
Cc: Sasha Levin <sashal(a)kernel.org>
--
2.20.1
A second regression was found in the immediate data transfer (IDT)
support which was added to 5.2 kernel
IDT is used to transfer small amounts of data (up to 8 bytes) in the
field normally used for data dma address, thus avoiding dma mapping.
If the data was not already dma mapped, then IDT support assumed data was
in urb->transfer_buffer, and did not take into accound that even
small amounts of data (8 bytes) can be in a scatterlist instead.
This caused a NULL pointer dereference when sg_dma_len() was used
with non-dma mapped data.
Solve this by not using IDT if scatter gather buffer list is used.
Fixes: 33e39350ebd2 ("usb: xhci: add Immediate Data Transfer support")
Cc: <stable(a)vger.kernel.org> # v5.2
Reported-by: Maik Stohn <maik.stohn(a)seal-one.com>
Tested-by: Maik Stohn <maik.stohn(a)seal-one.com>
CC: Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 7a26496..f5c4144 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -2175,7 +2175,8 @@ static inline bool xhci_urb_suitable_for_idt(struct urb *urb)
if (!usb_endpoint_xfer_isoc(&urb->ep->desc) && usb_urb_dir_out(urb) &&
usb_endpoint_maxp(&urb->ep->desc) >= TRB_IDT_MAX_SIZE &&
urb->transfer_buffer_length <= TRB_IDT_MAX_SIZE &&
- !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
+ !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) &&
+ !urb->num_sgs)
return true;
return false;
--
2.7.4
This is a note to let you know that I've just added the patch titled
usb: wusbcore: fix unbalanced get/put cluster_id
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From f90bf1ece48a736097ea224430578fe586a9544c Mon Sep 17 00:00:00 2001
From: Phong Tran <tranmanphong(a)gmail.com>
Date: Wed, 24 Jul 2019 09:06:01 +0700
Subject: usb: wusbcore: fix unbalanced get/put cluster_id
syzboot reported that
https://syzkaller.appspot.com/bug?extid=fd2bd7df88c606eea4ef
There is not consitency parameter in cluste_id_get/put calling.
In case of getting the id with result is failure, the wusbhc->cluster_id
will not be updated and this can not be used for wusb_cluster_id_put().
Tested report
https://groups.google.com/d/msg/syzkaller-bugs/0znZopp3-9k/oxOrhLkLEgAJ
Reproduce and gdb got the details:
139 addr = wusb_cluster_id_get();
(gdb) n
140 if (addr == 0)
(gdb) print addr
$1 = 254 '\376'
(gdb) n
142 result = __hwahc_set_cluster_id(hwahc, addr);
(gdb) print result
$2 = -71
(gdb) break wusb_cluster_id_put
Breakpoint 3 at 0xffffffff836e3f20: file drivers/usb/wusbcore/wusbhc.c, line 384.
(gdb) s
Thread 2 hit Breakpoint 3, wusb_cluster_id_put (id=0 '\000') at drivers/usb/wusbcore/wusbhc.c:384
384 id = 0xff - id;
(gdb) n
385 BUG_ON(id >= CLUSTER_IDS);
(gdb) print id
$3 = 255 '\377'
Reported-by: syzbot+fd2bd7df88c606eea4ef(a)syzkaller.appspotmail.com
Signed-off-by: Phong Tran <tranmanphong(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190724020601.15257-1-tranmanphong@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/hwa-hc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/host/hwa-hc.c b/drivers/usb/host/hwa-hc.c
index 09a8ebd95588..6968b9f2b76b 100644
--- a/drivers/usb/host/hwa-hc.c
+++ b/drivers/usb/host/hwa-hc.c
@@ -159,7 +159,7 @@ static int hwahc_op_start(struct usb_hcd *usb_hcd)
return result;
error_set_cluster_id:
- wusb_cluster_id_put(wusbhc->cluster_id);
+ wusb_cluster_id_put(addr);
error_cluster_id_get:
goto out;
--
2.22.0
This is a note to let you know that I've just added the patch titled
usb-storage: Add a limitation for blk_queue_max_hw_sectors()
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d74ffae8b8dd17eaa8b82fc163e6aa2076dc8fb1 Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Mon, 22 Jul 2019 19:58:25 +0900
Subject: usb-storage: Add a limitation for blk_queue_max_hw_sectors()
This patch fixes an issue that the following error happens on
swiotlb environment:
xhci-hcd ee000000.usb: swiotlb buffer is full (sz: 524288 bytes), total 32768 (slots), used 1338 (slots)
On the kernel v5.1, block settings of a usb-storage with SuperSpeed
were the following so that the block layer will allocate buffers
up to 64 KiB, and then the issue didn't happen.
max_segment_size = 65536
max_hw_sectors_kb = 1024
After the commit 09324d32d2a0 ("block: force an unlimited segment
size on queues with a virt boundary") is applied, the block settings
are the following. So, the block layer will allocate buffers up to
1024 KiB, and then the issue happens:
max_segment_size = 4294967295
max_hw_sectors_kb = 1024
To fix the issue, the usb-storage driver checks the maximum size of
a mapping for the device and then adjusts the max_hw_sectors_kb
if required. After this patch is applied, the block settings will
be the following, and then the issue doesn't happen.
max_segment_size = 4294967295
max_hw_sectors_kb = 256
Fixes: 09324d32d2a0 ("block: force an unlimited segment size on queues with a virt boundary")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/1563793105-20597-1-git-send-email-yoshihiro.shimo…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/storage/scsiglue.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 30790240aec6..05b80211290d 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -28,6 +28,8 @@
* status of a command.
*/
+#include <linux/blkdev.h>
+#include <linux/dma-mapping.h>
#include <linux/module.h>
#include <linux/mutex.h>
@@ -99,6 +101,7 @@ static int slave_alloc (struct scsi_device *sdev)
static int slave_configure(struct scsi_device *sdev)
{
struct us_data *us = host_to_us(sdev->host);
+ struct device *dev = us->pusb_dev->bus->sysdev;
/*
* Many devices have trouble transferring more than 32KB at a time,
@@ -128,6 +131,14 @@ static int slave_configure(struct scsi_device *sdev)
blk_queue_max_hw_sectors(sdev->request_queue, 2048);
}
+ /*
+ * The max_hw_sectors should be up to maximum size of a mapping for
+ * the device. Otherwise, a DMA API might fail on swiotlb environment.
+ */
+ blk_queue_max_hw_sectors(sdev->request_queue,
+ min_t(size_t, queue_max_hw_sectors(sdev->request_queue),
+ dma_max_mapping_size(dev) >> SECTOR_SHIFT));
+
/*
* Some USB host controllers can't do DMA; they have to use PIO.
* They indicate this by setting their dma_mask to NULL. For
--
2.22.0
This is a note to let you know that I've just added the patch titled
usb: pci-quirks: Correct AMD PLL quirk detection
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From f3dccdaade4118070a3a47bef6b18321431f9ac6 Mon Sep 17 00:00:00 2001
From: Ryan Kennedy <ryan5544(a)gmail.com>
Date: Thu, 4 Jul 2019 11:35:28 -0400
Subject: usb: pci-quirks: Correct AMD PLL quirk detection
The AMD PLL USB quirk is incorrectly enabled on newer Ryzen
chipsets. The logic in usb_amd_find_chipset_info currently checks
for unaffected chipsets rather than affected ones. This broke
once a new chipset was added in e788787ef. It makes more sense
to reverse the logic so it won't need to be updated as new
chipsets are added. Note that the core of the workaround in
usb_amd_quirk_pll does correctly check the chipset.
Signed-off-by: Ryan Kennedy <ryan5544(a)gmail.com>
Fixes: e788787ef4f9 ("usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume")
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Link: https://lore.kernel.org/r/20190704153529.9429-2-ryan5544@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/pci-quirks.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index 3ce71cbfbb58..ad05c27b3a7b 100644
--- a/drivers/usb/host/pci-quirks.c
+++ b/drivers/usb/host/pci-quirks.c
@@ -205,7 +205,7 @@ int usb_amd_find_chipset_info(void)
{
unsigned long flags;
struct amd_chipset_info info;
- int ret;
+ int need_pll_quirk = 0;
spin_lock_irqsave(&amd_lock, flags);
@@ -219,21 +219,28 @@ int usb_amd_find_chipset_info(void)
spin_unlock_irqrestore(&amd_lock, flags);
if (!amd_chipset_sb_type_init(&info)) {
- ret = 0;
goto commit;
}
- /* Below chipset generations needn't enable AMD PLL quirk */
- if (info.sb_type.gen == AMD_CHIPSET_UNKNOWN ||
- info.sb_type.gen == AMD_CHIPSET_SB600 ||
- info.sb_type.gen == AMD_CHIPSET_YANGTZE ||
- (info.sb_type.gen == AMD_CHIPSET_SB700 &&
- info.sb_type.rev > 0x3b)) {
+ switch (info.sb_type.gen) {
+ case AMD_CHIPSET_SB700:
+ need_pll_quirk = info.sb_type.rev <= 0x3B;
+ break;
+ case AMD_CHIPSET_SB800:
+ case AMD_CHIPSET_HUDSON2:
+ case AMD_CHIPSET_BOLTON:
+ need_pll_quirk = 1;
+ break;
+ default:
+ need_pll_quirk = 0;
+ break;
+ }
+
+ if (!need_pll_quirk) {
if (info.smbus_dev) {
pci_dev_put(info.smbus_dev);
info.smbus_dev = NULL;
}
- ret = 0;
goto commit;
}
@@ -252,7 +259,7 @@ int usb_amd_find_chipset_info(void)
}
}
- ret = info.probe_result = 1;
+ need_pll_quirk = info.probe_result = 1;
printk(KERN_DEBUG "QUIRK: Enable AMD PLL fix\n");
commit:
@@ -263,7 +270,7 @@ int usb_amd_find_chipset_info(void)
/* Mark that we where here */
amd_chipset.probe_count++;
- ret = amd_chipset.probe_result;
+ need_pll_quirk = amd_chipset.probe_result;
spin_unlock_irqrestore(&amd_lock, flags);
@@ -277,7 +284,7 @@ int usb_amd_find_chipset_info(void)
spin_unlock_irqrestore(&amd_lock, flags);
}
- return ret;
+ return need_pll_quirk;
}
EXPORT_SYMBOL_GPL(usb_amd_find_chipset_info);
--
2.22.0
Add an extra condition to add the video output control class when the
device has some hdmi outputs defined. This is required to then always
be able to add the display present control, which is enabled when
there are some hdmi outputs.
This fixes the corner case where no_error_inj is enabled and the
device has no frame buffer but some hdmi outputs, as otherwise the
video output control class would be added anyway. Without this fix,
the sanity checks fail in v4l2_ctrl_new() as name is NULL.
Fixes: c533435ffb91 ("media: vivid: add display present control")
Cc: stable(a)vger.kernel.org
Signed-off-by: Guillaume Tucker <guillaume.tucker(a)collabora.com>
---
drivers/media/platform/vivid/vivid-ctrls.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/vivid/vivid-ctrls.c b/drivers/media/platform/vivid/vivid-ctrls.c
index 3e916c8befb7..7a52f585cab7 100644
--- a/drivers/media/platform/vivid/vivid-ctrls.c
+++ b/drivers/media/platform/vivid/vivid-ctrls.c
@@ -1473,7 +1473,7 @@ int vivid_create_controls(struct vivid_dev *dev, bool show_ccs_cap,
v4l2_ctrl_handler_init(hdl_vid_cap, 55);
v4l2_ctrl_new_custom(hdl_vid_cap, &vivid_ctrl_class, NULL);
v4l2_ctrl_handler_init(hdl_vid_out, 26);
- if (!no_error_inj || dev->has_fb)
+ if (!no_error_inj || dev->has_fb || dev->num_hdmi_outputs)
v4l2_ctrl_new_custom(hdl_vid_out, &vivid_ctrl_class, NULL);
v4l2_ctrl_handler_init(hdl_vbi_cap, 21);
v4l2_ctrl_new_custom(hdl_vbi_cap, &vivid_ctrl_class, NULL);
--
2.20.1
This is a note to let you know that I've just added the patch titled
staging: wilc1000: flush the workqueue before deinit the host
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From fb2b055b7e6e44efda737c7c92f46c0868bb04e5 Mon Sep 17 00:00:00 2001
From: Adham Abozaeid <adham.abozaeid(a)microchip.com>
Date: Mon, 22 Jul 2019 21:38:44 +0000
Subject: staging: wilc1000: flush the workqueue before deinit the host
Before deinitializing the host interface, the workqueue should be flushed
to handle any pending deferred work
Signed-off-by: Adham Abozaeid <adham.abozaeid(a)microchip.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190722213837.21952-1-adham.abozaeid@microchip.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/wilc1000/wilc_wfi_cfgoperations.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/wilc1000/wilc_wfi_cfgoperations.c b/drivers/staging/wilc1000/wilc_wfi_cfgoperations.c
index d72fdd333050..736eedef23b6 100644
--- a/drivers/staging/wilc1000/wilc_wfi_cfgoperations.c
+++ b/drivers/staging/wilc1000/wilc_wfi_cfgoperations.c
@@ -1969,6 +1969,7 @@ void wilc_deinit_host_int(struct net_device *net)
priv->p2p_listen_state = false;
+ flush_workqueue(vif->wilc->hif_workqueue);
mutex_destroy(&priv->scan_req_lock);
ret = wilc_deinit(vif);
--
2.22.0
This is a note to let you know that I've just added the patch titled
staging: gasket: apex: fix copy-paste typo
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 66665bb9979246729562a09fcdbb101c83127989 Mon Sep 17 00:00:00 2001
From: Ivan Bornyakov <brnkv.i1(a)gmail.com>
Date: Wed, 10 Jul 2019 23:45:18 +0300
Subject: staging: gasket: apex: fix copy-paste typo
In sysfs_show() case-branches ATTR_KERNEL_HIB_PAGE_TABLE_SIZE and
ATTR_KERNEL_HIB_SIMPLE_PAGE_TABLE_SIZE do the same. It looks like
copy-paste mistake.
Signed-off-by: Ivan Bornyakov <brnkv.i1(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190710204518.16814-1-brnkv.i1@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/gasket/apex_driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/gasket/apex_driver.c b/drivers/staging/gasket/apex_driver.c
index 2be45ee9d061..464648ee2036 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -532,7 +532,7 @@ static ssize_t sysfs_show(struct device *device, struct device_attribute *attr,
break;
case ATTR_KERNEL_HIB_SIMPLE_PAGE_TABLE_SIZE:
ret = scnprintf(buf, PAGE_SIZE, "%u\n",
- gasket_page_table_num_entries(
+ gasket_page_table_num_simple_entries(
gasket_dev->page_table[0]));
break;
case ATTR_KERNEL_HIB_NUM_ACTIVE_PAGES:
--
2.22.0
This is a note to let you know that I've just added the patch titled
Staging: fbtft: Fix probing of gpio descriptor
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From dbc4f989c878fe101fb7920e9609e8ec44e097cd Mon Sep 17 00:00:00 2001
From: Phil Reid <preid(a)electromag.com.au>
Date: Tue, 16 Jul 2019 08:24:36 +0800
Subject: Staging: fbtft: Fix probing of gpio descriptor
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Conversion to use gpio descriptors broke all gpio lookups as
devm_gpiod_get_index was converted to use dev->driver->name for
the gpio name lookup. Fix this by using the name param. In
addition gpiod_get post-fixes the -gpios to the name so that
shouldn't be included in the call. However this then breaks the
of_find_property call to see if the gpio entry exists as all
fbtft treats all gpios as optional. So use devm_gpiod_get_index_optional
instead which achieves the same thing and is simpler.
Nishad confirmed the changes where only ever compile tested.
Fixes: c440eee1a7a1 ("Staging: fbtft: Switch to the gpio descriptor interface")
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
Tested-by: Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
Tested-by: Jan Sebastian Götte <linux(a)jaseg.net>
Signed-off-by: Phil Reid <preid(a)electromag.com.au>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1563236677-5045-2-git-send-email-preid@electromag…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/fbtft/fbtft-core.c | 39 ++++++++++++++----------------
1 file changed, 18 insertions(+), 21 deletions(-)
diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index 7cbc1bdd2d8a..b963cccdc3f6 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -76,21 +76,18 @@ static int fbtft_request_one_gpio(struct fbtft_par *par,
struct gpio_desc **gpiop)
{
struct device *dev = par->info->device;
- struct device_node *node = dev->of_node;
int ret = 0;
- if (of_find_property(node, name, NULL)) {
- *gpiop = devm_gpiod_get_index(dev, dev->driver->name, index,
- GPIOD_OUT_HIGH);
- if (IS_ERR(*gpiop)) {
- ret = PTR_ERR(*gpiop);
- dev_err(dev,
- "Failed to request %s GPIO:%d\n", name, ret);
- return ret;
- }
- fbtft_par_dbg(DEBUG_REQUEST_GPIOS, par, "%s: '%s' GPIO\n",
- __func__, name);
+ *gpiop = devm_gpiod_get_index_optional(dev, name, index,
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(*gpiop)) {
+ ret = PTR_ERR(*gpiop);
+ dev_err(dev,
+ "Failed to request %s GPIO: %d\n", name, ret);
+ return ret;
}
+ fbtft_par_dbg(DEBUG_REQUEST_GPIOS, par, "%s: '%s' GPIO\n",
+ __func__, name);
return ret;
}
@@ -103,34 +100,34 @@ static int fbtft_request_gpios_dt(struct fbtft_par *par)
if (!par->info->device->of_node)
return -EINVAL;
- ret = fbtft_request_one_gpio(par, "reset-gpios", 0, &par->gpio.reset);
+ ret = fbtft_request_one_gpio(par, "reset", 0, &par->gpio.reset);
if (ret)
return ret;
- ret = fbtft_request_one_gpio(par, "dc-gpios", 0, &par->gpio.dc);
+ ret = fbtft_request_one_gpio(par, "dc", 0, &par->gpio.dc);
if (ret)
return ret;
- ret = fbtft_request_one_gpio(par, "rd-gpios", 0, &par->gpio.rd);
+ ret = fbtft_request_one_gpio(par, "rd", 0, &par->gpio.rd);
if (ret)
return ret;
- ret = fbtft_request_one_gpio(par, "wr-gpios", 0, &par->gpio.wr);
+ ret = fbtft_request_one_gpio(par, "wr", 0, &par->gpio.wr);
if (ret)
return ret;
- ret = fbtft_request_one_gpio(par, "cs-gpios", 0, &par->gpio.cs);
+ ret = fbtft_request_one_gpio(par, "cs", 0, &par->gpio.cs);
if (ret)
return ret;
- ret = fbtft_request_one_gpio(par, "latch-gpios", 0, &par->gpio.latch);
+ ret = fbtft_request_one_gpio(par, "latch", 0, &par->gpio.latch);
if (ret)
return ret;
for (i = 0; i < 16; i++) {
- ret = fbtft_request_one_gpio(par, "db-gpios", i,
+ ret = fbtft_request_one_gpio(par, "db", i,
&par->gpio.db[i]);
if (ret)
return ret;
- ret = fbtft_request_one_gpio(par, "led-gpios", i,
+ ret = fbtft_request_one_gpio(par, "led", i,
&par->gpio.led[i]);
if (ret)
return ret;
- ret = fbtft_request_one_gpio(par, "aux-gpios", i,
+ ret = fbtft_request_one_gpio(par, "aux", i,
&par->gpio.aux[i]);
if (ret)
return ret;
--
2.22.0
This is a note to let you know that I've just added the patch titled
Staging: fbtft: Fix reset assertion when using gpio descriptor
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From b918d1c2706619cb0712a61cc8c05148b68b24b2 Mon Sep 17 00:00:00 2001
From: Phil Reid <preid(a)electromag.com.au>
Date: Tue, 16 Jul 2019 08:24:37 +0800
Subject: Staging: fbtft: Fix reset assertion when using gpio descriptor
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Typically gpiod_set_value calls would assert the reset line and
then release it using the symantics of:
gpiod_set_value(par->gpio.reset, 0);
... delay
gpiod_set_value(par->gpio.reset, 1);
And the gpio binding would specify the polarity.
Prior to conversion to gpiod calls the polarity in the DT
was ignored and assumed to be active low. Fix it so that
DT polarity is respected.
Fixes: c440eee1a7a1 ("Staging: fbtft: Switch to the gpio descriptor interface")
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
Tested-by: Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
Tested-by: Jan Sebastian Götte <linux(a)jaseg.net>
Signed-off-by: Phil Reid <preid(a)electromag.com.au>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1563236677-5045-3-git-send-email-preid@electromag…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/fbtft/fbtft-core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index b963cccdc3f6..c3179cc847f8 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -231,9 +231,9 @@ static void fbtft_reset(struct fbtft_par *par)
if (!par->gpio.reset)
return;
fbtft_par_dbg(DEBUG_RESET, par, "%s()\n", __func__);
- gpiod_set_value_cansleep(par->gpio.reset, 0);
- usleep_range(20, 40);
gpiod_set_value_cansleep(par->gpio.reset, 1);
+ usleep_range(20, 40);
+ gpiod_set_value_cansleep(par->gpio.reset, 0);
msleep(120);
}
--
2.22.0
This conexant codec isn't in the supported codec list yet, the hda
generic driver can drive this codec well, but on a Lenovo machine
with mute/mic-mute leds, we need to apply CXT_FIXUP_THINKPAD_ACPI
to make the leds work. After adding this codec to the list, the
driver patch_conexant.c will apply THINKPAD_ACPI to this machine.
Cc: stable(a)vger.kernel.org
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
sound/pci/hda/patch_conexant.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c
index 4f8d0845ee1e..f299f137eaea 100644
--- a/sound/pci/hda/patch_conexant.c
+++ b/sound/pci/hda/patch_conexant.c
@@ -1083,6 +1083,7 @@ static int patch_conexant_auto(struct hda_codec *codec)
*/
static const struct hda_device_id snd_hda_id_conexant[] = {
+ HDA_CODEC_ENTRY(0x14f11f86, "CX8070", patch_conexant_auto),
HDA_CODEC_ENTRY(0x14f12008, "CX8200", patch_conexant_auto),
HDA_CODEC_ENTRY(0x14f15045, "CX20549 (Venice)", patch_conexant_auto),
HDA_CODEC_ENTRY(0x14f15047, "CX20551 (Waikiki)", patch_conexant_auto),
--
2.17.1
commit b091ac616846a1da75b1f2566b41255ce7f0e0a6 upstream.
During disk scan and revalidation done with sd_revalidate(), the zones
of a zoned disk are checked using the helper function
blk_revalidate_disk_zones() if a configuration change is detected
(change in the number of zones or zone size). The function
blk_revalidate_disk_zones() issues report_zones calls that are very
large, that is, to obtain zone information for all zones of the disk
with a single command. The size of the report zones command buffer
necessary for such large request generally is lower than the disk
max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no
memory fragmentation), but often fail at run time (e.g. hot-plug
event). This causes the disk revalidation to fail and the disk
capacity to be changed to 0.
This problem can be avoided by using vmalloc() instead of kmalloc() for
the buffer allocation. To limit the amount of memory to be allocated,
this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES
maximum number of zones to report with a single report zones command.
This limit may be lowered further to satisfy the disk max_hw_sectors
limit. Finally, to ensure that the vmalloc-ed buffer can always be
mapped in a request, the buffer size is further limited to at most
queue_max_segments() pages, allowing successful mapping of the buffer
even in the worst case scenario where none of the buffer pages are
contiguous.
Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a3748c ("block: add a report_zones method")
Cc: stable(a)vger.kernel.org # 5.1.x
Cc: stable(a)vger.kernel.org # 5.2.x
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/scsi/sd_zbc.c | 104 ++++++++++++++++++++++++++++++------------
1 file changed, 75 insertions(+), 29 deletions(-)
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index a340af797a85..a9f3a8d77ee7 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -23,6 +23,8 @@
*/
#include <linux/blkdev.h>
+#include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
#include <asm/unaligned.h>
@@ -64,7 +66,7 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
/**
* sd_zbc_do_report_zones - Issue a REPORT ZONES scsi command.
* @sdkp: The target disk
- * @buf: Buffer to use for the reply
+ * @buf: vmalloc-ed buffer to use for the reply
* @buflen: the buffer size
* @lba: Start LBA of the report
* @partial: Do partial report
@@ -93,7 +95,6 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
put_unaligned_be32(buflen, &cmd[10]);
if (partial)
cmd[14] = ZBC_REPORT_ZONE_PARTIAL;
- memset(buf, 0, buflen);
result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
buf, buflen, &sshdr,
@@ -117,6 +118,53 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
return 0;
}
+/*
+ * Maximum number of zones to get with one report zones command.
+ */
+#define SD_ZBC_REPORT_MAX_ZONES 8192U
+
+/**
+ * Allocate a buffer for report zones reply.
+ * @sdkp: The target disk
+ * @nr_zones: Maximum number of zones to report
+ * @buflen: Size of the buffer allocated
+ *
+ * Try to allocate a reply buffer for the number of requested zones.
+ * The size of the buffer allocated may be smaller than requested to
+ * satify the device constraint (max_hw_sectors, max_segments, etc).
+ *
+ * Return the address of the allocated buffer and update @buflen with
+ * the size of the allocated buffer.
+ */
+static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
+ unsigned int nr_zones, size_t *buflen)
+{
+ struct request_queue *q = sdkp->disk->queue;
+ size_t bufsize;
+ void *buf;
+
+ /*
+ * Report zone buffer size should be at most 64B times the number of
+ * zones requested plus the 64B reply header, but should be at least
+ * SECTOR_SIZE for ATA devices.
+ * Make sure that this size does not exceed the hardware capabilities.
+ * Furthermore, since the report zone command cannot be split, make
+ * sure that the allocated buffer can always be mapped by limiting the
+ * number of pages allocated to the HBA max segments limit.
+ */
+ nr_zones = min(nr_zones, SD_ZBC_REPORT_MAX_ZONES);
+ bufsize = roundup((nr_zones + 1) * 64, 512);
+ bufsize = min_t(size_t, bufsize,
+ queue_max_hw_sectors(q) << SECTOR_SHIFT);
+ bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT);
+
+ buf = vzalloc(bufsize);
+ if (buf)
+ *buflen = bufsize;
+
+ return buf;
+}
+
/**
* sd_zbc_report_zones - Disk report zones operation.
* @disk: The target disk
@@ -132,30 +180,23 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
gfp_t gfp_mask)
{
struct scsi_disk *sdkp = scsi_disk(disk);
- unsigned int i, buflen, nrz = *nr_zones;
+ unsigned int i, nrz = *nr_zones;
unsigned char *buf;
- size_t offset = 0;
+ size_t buflen = 0, offset = 0;
int ret = 0;
if (!sd_is_zoned(sdkp))
/* Not a zoned device */
return -EOPNOTSUPP;
- /*
- * Get a reply buffer for the number of requested zones plus a header,
- * without exceeding the device maximum command size. For ATA disks,
- * buffers must be aligned to 512B.
- */
- buflen = min(queue_max_hw_sectors(disk->queue) << 9,
- roundup((nrz + 1) * 64, 512));
- buf = kmalloc(buflen, gfp_mask);
+ buf = sd_zbc_alloc_report_buffer(sdkp, nrz, &buflen);
if (!buf)
return -ENOMEM;
ret = sd_zbc_do_report_zones(sdkp, buf, buflen,
sectors_to_logical(sdkp->device, sector), true);
if (ret)
- goto out_free_buf;
+ goto out;
nrz = min(nrz, get_unaligned_be32(&buf[0]) / 64);
for (i = 0; i < nrz; i++) {
@@ -166,8 +207,8 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
*nr_zones = nrz;
-out_free_buf:
- kfree(buf);
+out:
+ kvfree(buf);
return ret;
}
@@ -301,8 +342,6 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
return 0;
}
-#define SD_ZBC_BUF_SIZE 131072U
-
/**
* sd_zbc_check_zones - Check the device capacity and zone sizes
* @sdkp: Target disk
@@ -318,22 +357,28 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
*/
static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
{
+ size_t bufsize, buflen;
+ unsigned int noio_flag;
u64 zone_blocks = 0;
sector_t max_lba, block = 0;
unsigned char *buf;
unsigned char *rec;
- unsigned int buf_len;
- unsigned int list_length;
int ret;
u8 same;
+ /* Do all memory allocations as if GFP_NOIO was specified */
+ noio_flag = memalloc_noio_save();
+
/* Get a buffer */
- buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
- if (!buf)
- return -ENOMEM;
+ buf = sd_zbc_alloc_report_buffer(sdkp, SD_ZBC_REPORT_MAX_ZONES,
+ &bufsize);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
/* Do a report zone to get max_lba and the same field */
- ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0, false);
+ ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, 0, false);
if (ret)
goto out_free;
@@ -369,12 +414,12 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
do {
/* Parse REPORT ZONES header */
- list_length = get_unaligned_be32(&buf[0]) + 64;
+ buflen = min_t(size_t, get_unaligned_be32(&buf[0]) + 64,
+ bufsize);
rec = buf + 64;
- buf_len = min(list_length, SD_ZBC_BUF_SIZE);
/* Parse zone descriptors */
- while (rec < buf + buf_len) {
+ while (rec < buf + buflen) {
u64 this_zone_blocks = get_unaligned_be64(&rec[8]);
if (zone_blocks == 0) {
@@ -390,8 +435,8 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
}
if (block < sdkp->capacity) {
- ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
- block, true);
+ ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, block,
+ true);
if (ret)
goto out_free;
}
@@ -422,7 +467,8 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
}
out_free:
- kfree(buf);
+ memalloc_noio_restore(noio_flag);
+ kvfree(buf);
return ret;
}
--
2.21.0
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 26202928fafad8bda8b478edb7e62c885be623d7 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Mon, 1 Jul 2019 14:09:18 +0900
Subject: [PATCH] block: Limit zone array allocation size
Limit the size of the struct blk_zone array used in
blk_revalidate_disk_zones() to avoid memory allocation failures leading
to disk revalidation failure. Also further reduce the likelyhood of
such failures by using kvcalloc() (that is vmalloc()) instead of
allocating contiguous pages with alloc_pages().
Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a3748c ("block: add a report_zones method")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 58ced170b424..6c503824ba3f 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -14,6 +14,8 @@
#include <linux/rbtree.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
#include <linux/sched/mm.h>
#include "blk.h"
@@ -371,22 +373,25 @@ static inline unsigned long *blk_alloc_zone_bitmap(int node,
* Allocate an array of struct blk_zone to get nr_zones zone information.
* The allocated array may be smaller than nr_zones.
*/
-static struct blk_zone *blk_alloc_zones(int node, unsigned int *nr_zones)
+static struct blk_zone *blk_alloc_zones(unsigned int *nr_zones)
{
- size_t size = *nr_zones * sizeof(struct blk_zone);
- struct page *page;
- int order;
-
- for (order = get_order(size); order >= 0; order--) {
- page = alloc_pages_node(node, GFP_NOIO | __GFP_ZERO, order);
- if (page) {
- *nr_zones = min_t(unsigned int, *nr_zones,
- (PAGE_SIZE << order) / sizeof(struct blk_zone));
- return page_address(page);
- }
+ struct blk_zone *zones;
+ size_t nrz = min(*nr_zones, BLK_ZONED_REPORT_MAX_ZONES);
+
+ /*
+ * GFP_KERNEL here is meaningless as the caller task context has
+ * the PF_MEMALLOC_NOIO flag set in blk_revalidate_disk_zones()
+ * with memalloc_noio_save().
+ */
+ zones = kvcalloc(nrz, sizeof(struct blk_zone), GFP_KERNEL);
+ if (!zones) {
+ *nr_zones = 0;
+ return NULL;
}
- return NULL;
+ *nr_zones = nrz;
+
+ return zones;
}
void blk_queue_free_zone_bitmaps(struct request_queue *q)
@@ -448,7 +453,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
/* Get zone information and initialize seq_zones_bitmap */
rep_nr_zones = nr_zones;
- zones = blk_alloc_zones(q->node, &rep_nr_zones);
+ zones = blk_alloc_zones(&rep_nr_zones);
if (!zones)
goto out;
@@ -487,8 +492,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
out:
memalloc_noio_restore(noio_flag);
- free_pages((unsigned long)zones,
- get_order(rep_nr_zones * sizeof(struct blk_zone)));
+ kvfree(zones);
kfree(seq_zones_wlock);
kfree(seq_zones_bitmap);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 05036e3e3458..1ef375dafb1c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -344,6 +344,11 @@ struct queue_limits {
#ifdef CONFIG_BLK_DEV_ZONED
+/*
+ * Maximum number of zones to report with a single report zones command.
+ */
+#define BLK_ZONED_REPORT_MAX_ZONES 8192U
+
extern unsigned int blkdev_nr_zones(struct block_device *bdev);
extern int blkdev_report_zones(struct block_device *bdev,
sector_t sector, struct blk_zone *zones,
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b091ac616846a1da75b1f2566b41255ce7f0e0a6 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Mon, 1 Jul 2019 14:09:17 +0900
Subject: [PATCH] sd_zbc: Fix report zones buffer allocation
During disk scan and revalidation done with sd_revalidate(), the zones
of a zoned disk are checked using the helper function
blk_revalidate_disk_zones() if a configuration change is detected
(change in the number of zones or zone size). The function
blk_revalidate_disk_zones() issues report_zones calls that are very
large, that is, to obtain zone information for all zones of the disk
with a single command. The size of the report zones command buffer
necessary for such large request generally is lower than the disk
max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no
memory fragmentation), but often fail at run time (e.g. hot-plug
event). This causes the disk revalidation to fail and the disk
capacity to be changed to 0.
This problem can be avoided by using vmalloc() instead of kmalloc() for
the buffer allocation. To limit the amount of memory to be allocated,
this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES
maximum number of zones to report with a single report zones command.
This limit may be lowered further to satisfy the disk max_hw_sectors
limit. Finally, to ensure that the vmalloc-ed buffer can always be
mapped in a request, the buffer size is further limited to at most
queue_max_segments() pages, allowing successful mapping of the buffer
even in the worst case scenario where none of the buffer pages are
contiguous.
Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a3748c ("block: add a report_zones method")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index ec3764c8f3f1..db16c19e05c4 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -9,6 +9,8 @@
*/
#include <linux/blkdev.h>
+#include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
#include <asm/unaligned.h>
@@ -50,7 +52,7 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
/**
* sd_zbc_do_report_zones - Issue a REPORT ZONES scsi command.
* @sdkp: The target disk
- * @buf: Buffer to use for the reply
+ * @buf: vmalloc-ed buffer to use for the reply
* @buflen: the buffer size
* @lba: Start LBA of the report
* @partial: Do partial report
@@ -79,7 +81,6 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
put_unaligned_be32(buflen, &cmd[10]);
if (partial)
cmd[14] = ZBC_REPORT_ZONE_PARTIAL;
- memset(buf, 0, buflen);
result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
buf, buflen, &sshdr,
@@ -103,6 +104,53 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
return 0;
}
+/*
+ * Maximum number of zones to get with one report zones command.
+ */
+#define SD_ZBC_REPORT_MAX_ZONES 8192U
+
+/**
+ * Allocate a buffer for report zones reply.
+ * @sdkp: The target disk
+ * @nr_zones: Maximum number of zones to report
+ * @buflen: Size of the buffer allocated
+ *
+ * Try to allocate a reply buffer for the number of requested zones.
+ * The size of the buffer allocated may be smaller than requested to
+ * satify the device constraint (max_hw_sectors, max_segments, etc).
+ *
+ * Return the address of the allocated buffer and update @buflen with
+ * the size of the allocated buffer.
+ */
+static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
+ unsigned int nr_zones, size_t *buflen)
+{
+ struct request_queue *q = sdkp->disk->queue;
+ size_t bufsize;
+ void *buf;
+
+ /*
+ * Report zone buffer size should be at most 64B times the number of
+ * zones requested plus the 64B reply header, but should be at least
+ * SECTOR_SIZE for ATA devices.
+ * Make sure that this size does not exceed the hardware capabilities.
+ * Furthermore, since the report zone command cannot be split, make
+ * sure that the allocated buffer can always be mapped by limiting the
+ * number of pages allocated to the HBA max segments limit.
+ */
+ nr_zones = min(nr_zones, SD_ZBC_REPORT_MAX_ZONES);
+ bufsize = roundup((nr_zones + 1) * 64, 512);
+ bufsize = min_t(size_t, bufsize,
+ queue_max_hw_sectors(q) << SECTOR_SHIFT);
+ bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT);
+
+ buf = vzalloc(bufsize);
+ if (buf)
+ *buflen = bufsize;
+
+ return buf;
+}
+
/**
* sd_zbc_report_zones - Disk report zones operation.
* @disk: The target disk
@@ -116,30 +164,23 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
struct blk_zone *zones, unsigned int *nr_zones)
{
struct scsi_disk *sdkp = scsi_disk(disk);
- unsigned int i, buflen, nrz = *nr_zones;
+ unsigned int i, nrz = *nr_zones;
unsigned char *buf;
- size_t offset = 0;
+ size_t buflen = 0, offset = 0;
int ret = 0;
if (!sd_is_zoned(sdkp))
/* Not a zoned device */
return -EOPNOTSUPP;
- /*
- * Get a reply buffer for the number of requested zones plus a header,
- * without exceeding the device maximum command size. For ATA disks,
- * buffers must be aligned to 512B.
- */
- buflen = min(queue_max_hw_sectors(disk->queue) << 9,
- roundup((nrz + 1) * 64, 512));
- buf = kmalloc(buflen, GFP_KERNEL);
+ buf = sd_zbc_alloc_report_buffer(sdkp, nrz, &buflen);
if (!buf)
return -ENOMEM;
ret = sd_zbc_do_report_zones(sdkp, buf, buflen,
sectors_to_logical(sdkp->device, sector), true);
if (ret)
- goto out_free_buf;
+ goto out;
nrz = min(nrz, get_unaligned_be32(&buf[0]) / 64);
for (i = 0; i < nrz; i++) {
@@ -150,8 +191,8 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
*nr_zones = nrz;
-out_free_buf:
- kfree(buf);
+out:
+ kvfree(buf);
return ret;
}
@@ -285,8 +326,6 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
return 0;
}
-#define SD_ZBC_BUF_SIZE 131072U
-
/**
* sd_zbc_check_zones - Check the device capacity and zone sizes
* @sdkp: Target disk
@@ -302,22 +341,28 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
*/
static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
{
+ size_t bufsize, buflen;
+ unsigned int noio_flag;
u64 zone_blocks = 0;
sector_t max_lba, block = 0;
unsigned char *buf;
unsigned char *rec;
- unsigned int buf_len;
- unsigned int list_length;
int ret;
u8 same;
+ /* Do all memory allocations as if GFP_NOIO was specified */
+ noio_flag = memalloc_noio_save();
+
/* Get a buffer */
- buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
- if (!buf)
- return -ENOMEM;
+ buf = sd_zbc_alloc_report_buffer(sdkp, SD_ZBC_REPORT_MAX_ZONES,
+ &bufsize);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
/* Do a report zone to get max_lba and the same field */
- ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0, false);
+ ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, 0, false);
if (ret)
goto out_free;
@@ -353,12 +398,12 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
do {
/* Parse REPORT ZONES header */
- list_length = get_unaligned_be32(&buf[0]) + 64;
+ buflen = min_t(size_t, get_unaligned_be32(&buf[0]) + 64,
+ bufsize);
rec = buf + 64;
- buf_len = min(list_length, SD_ZBC_BUF_SIZE);
/* Parse zone descriptors */
- while (rec < buf + buf_len) {
+ while (rec < buf + buflen) {
u64 this_zone_blocks = get_unaligned_be64(&rec[8]);
if (zone_blocks == 0) {
@@ -374,8 +419,8 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
}
if (block < sdkp->capacity) {
- ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
- block, true);
+ ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, block,
+ true);
if (ret)
goto out_free;
}
@@ -406,7 +451,8 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
}
out_free:
- kfree(buf);
+ memalloc_noio_restore(noio_flag);
+ kvfree(buf);
return ret;
}
The patch titled
Subject: /proc/kpageflags: prevent an integer overflow in stable_page_flags()
has been added to the -mm tree. Its filename is
proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/proc-kpageflags-prevent-an-integer…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/proc-kpageflags-prevent-an-integer…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Toshiki Fukasawa <t-fukasawa(a)vx.jp.nec.com>
Subject: /proc/kpageflags: prevent an integer overflow in stable_page_flags()
stable_page_flags() returns kpageflags info in u64, but it uses "1 <<
KPF_*" internally which is considered as int. This type mismatch causes
no visible problem now, but it will if you set bit 32 or more as done in a
subsequent patch. So use BIT_ULL in order to avoid future overflow
issues.
Link: http://lkml.kernel.org/r/20190725023100.31141-2-t-fukasawa@vx.jp.nec.com
Signed-off-by: Toshiki Fukasawa <t-fukasawa(a)vx.jp.nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Junichi Nomura <j-nomura(a)ce.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/page.c | 37 ++++++++++++++++++-------------------
1 file changed, 18 insertions(+), 19 deletions(-)
--- a/fs/proc/page.c~proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags
+++ a/fs/proc/page.c
@@ -95,7 +95,7 @@ u64 stable_page_flags(struct page *page)
* it differentiates a memory hole from a page with no flags
*/
if (!page)
- return 1 << KPF_NOPAGE;
+ return BIT_ULL(KPF_NOPAGE);
k = page->flags;
u = 0;
@@ -107,22 +107,22 @@ u64 stable_page_flags(struct page *page)
* simple test in page_mapped() is not enough.
*/
if (!PageSlab(page) && page_mapped(page))
- u |= 1 << KPF_MMAP;
+ u |= BIT_ULL(KPF_MMAP);
if (PageAnon(page))
- u |= 1 << KPF_ANON;
+ u |= BIT_ULL(KPF_ANON);
if (PageKsm(page))
- u |= 1 << KPF_KSM;
+ u |= BIT_ULL(KPF_KSM);
/*
* compound pages: export both head/tail info
* they together define a compound page's start/end pos and order
*/
if (PageHead(page))
- u |= 1 << KPF_COMPOUND_HEAD;
+ u |= BIT_ULL(KPF_COMPOUND_HEAD);
if (PageTail(page))
- u |= 1 << KPF_COMPOUND_TAIL;
+ u |= BIT_ULL(KPF_COMPOUND_TAIL);
if (PageHuge(page))
- u |= 1 << KPF_HUGE;
+ u |= BIT_ULL(KPF_HUGE);
/*
* PageTransCompound can be true for non-huge compound pages (slab
* pages or pages allocated by drivers with __GFP_COMP) because it
@@ -133,14 +133,13 @@ u64 stable_page_flags(struct page *page)
struct page *head = compound_head(page);
if (PageLRU(head) || PageAnon(head))
- u |= 1 << KPF_THP;
+ u |= BIT_ULL(KPF_THP);
else if (is_huge_zero_page(head)) {
- u |= 1 << KPF_ZERO_PAGE;
- u |= 1 << KPF_THP;
+ u |= BIT_ULL(KPF_ZERO_PAGE);
+ u |= BIT_ULL(KPF_THP);
}
} else if (is_zero_pfn(page_to_pfn(page)))
- u |= 1 << KPF_ZERO_PAGE;
-
+ u |= BIT_ULL(KPF_ZERO_PAGE);
/*
* Caveats on high order pages: page->_refcount will only be set
@@ -148,23 +147,23 @@ u64 stable_page_flags(struct page *page)
* SLOB won't set PG_slab at all on compound pages.
*/
if (PageBuddy(page))
- u |= 1 << KPF_BUDDY;
+ u |= BIT_ULL(KPF_BUDDY);
else if (page_count(page) == 0 && is_free_buddy_page(page))
- u |= 1 << KPF_BUDDY;
+ u |= BIT_ULL(KPF_BUDDY);
if (PageOffline(page))
- u |= 1 << KPF_OFFLINE;
+ u |= BIT_ULL(KPF_OFFLINE);
if (PageTable(page))
- u |= 1 << KPF_PGTABLE;
+ u |= BIT_ULL(KPF_PGTABLE);
if (page_is_idle(page))
- u |= 1 << KPF_IDLE;
+ u |= BIT_ULL(KPF_IDLE);
u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked);
u |= kpf_copy_bit(k, KPF_SLAB, PG_slab);
if (PageTail(page) && PageSlab(compound_head(page)))
- u |= 1 << KPF_SLAB;
+ u |= BIT_ULL(KPF_SLAB);
u |= kpf_copy_bit(k, KPF_ERROR, PG_error);
u |= kpf_copy_bit(k, KPF_DIRTY, PG_dirty);
@@ -177,7 +176,7 @@ u64 stable_page_flags(struct page *page)
u |= kpf_copy_bit(k, KPF_RECLAIM, PG_reclaim);
if (PageSwapCache(page))
- u |= 1 << KPF_SWAPCACHE;
+ u |= BIT_ULL(KPF_SWAPCACHE);
u |= kpf_copy_bit(k, KPF_SWAPBACKED, PG_swapbacked);
u |= kpf_copy_bit(k, KPF_UNEVICTABLE, PG_unevictable);
_
Patches currently in -mm which might be from t-fukasawa(a)vx.jp.nec.com are
proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags.patch
proc-kpageflags-do-not-use-uninitialized-struct-pages.patch
The patch titled
Subject: mm: document zone device struct page field usage
has been removed from the -mm tree. Its filename was
mm-document-zone-device-struct-page-field-usage.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm: document zone device struct page field usage
Patch series "mm/hmm: fixes for device private page migration", v2.
Testing the latest linux git tree turned up a few bugs with page migration
to and from ZONE_DEVICE private and anonymous pages. Hopefully it
clarifies how ZONE_DEVICE private struct page uses the same mapping and
index fields from the source anonymous page mapping.
This patch (of 3):
Struct page for ZONE_DEVICE private pages uses the page->mapping and and
page->index fields while the source anonymous pages are migrated to device
private memory. This is so rmap_walk() can find the page when migrating
the ZONE_DEVICE private page back to system memory. ZONE_DEVICE pmem
backed fsdax pages also use the page->mapping and page->index fields when
files are mapped into a process address space.
Restructure struct page and add comments to make this more clear.
Link: http://lkml.kernel.org/r/20190719192955.30462-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm_types.h | 42 +++++++++++++++++++++++++------------
1 file changed, 29 insertions(+), 13 deletions(-)
--- a/include/linux/mm_types.h~mm-document-zone-device-struct-page-field-usage
+++ a/include/linux/mm_types.h
@@ -76,13 +76,35 @@ struct page {
* avoid collision and false-positive PageTail().
*/
union {
- struct { /* Page cache and anonymous pages */
- /**
- * @lru: Pageout list, eg. active_list protected by
- * pgdat->lru_lock. Sometimes used as a generic list
- * by the page owner.
- */
- struct list_head lru;
+ struct { /* Page cache, anonymous, ZONE_DEVICE pages */
+ union {
+ /**
+ * @lru: Pageout list, e.g., active_list
+ * protected by pgdat->lru_lock. Sometimes
+ * used as a generic list by the page owner.
+ */
+ struct list_head lru;
+ /**
+ * ZONE_DEVICE pages are never on the lru
+ * list so they reuse the list space.
+ * ZONE_DEVICE private pages are counted as
+ * being mapped so the @mapping and @index
+ * fields are used while the page is migrated
+ * to device private memory.
+ * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
+ * use the @mapping and @index fields when pmem
+ * backed DAX files are mapped.
+ */
+ struct {
+ /**
+ * @pgmap: Points to the hosting
+ * device page map.
+ */
+ struct dev_pagemap *pgmap;
+ /** @zone_device_data: opaque data. */
+ void *zone_device_data;
+ };
+ };
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
pgoff_t index; /* Our offset within mapping. */
@@ -155,12 +177,6 @@ struct page {
spinlock_t ptl;
#endif
};
- struct { /* ZONE_DEVICE pages */
- /** @pgmap: Points to the hosting device page map. */
- struct dev_pagemap *pgmap;
- void *zone_device_data;
- unsigned long _zd_pad_1; /* uses mapping */
- };
/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
mm-migrate-initialize-pud_entry-in-migrate_vma.patch
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..003377e24232 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1475,7 +1475,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
/*
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
+ *
+ * The assignment to subpage above was computed from a
+ * swap PTE which results in an invalid pointer.
+ * Since only PAGE_SIZE pages can currently be
+ * migrated, just set it to page. This will need to be
+ * changed when hugepage migrations to device private
+ * memory are supported.
*/
+ subpage = page;
goto discard;
}
--
2.20.1
Hi,
We would like to learn your interest in acquiring our recently updated
Microsoft Office 365 Alternatives & Competitors contact list which helps
you to improve your business campaign.
We also provide Zoho Office, G Suite (Google Apps), WPS Office,Microsoft
Office Online, SoftMaker Office, LibreOffice 6 and Corel WordPerfect and
many more.
We provide Decision Makers Contacts such as C- Level, VP Level, Directors
and Managers Contact details.
Please let me know your targeted criteria with geography to provide you
with detailed information for your review
Regards,
Nancy Mcmahon
Marketing Specialist
If you don't want to include yourself in our mailing list, please reply
back “Leave Out" in a subject line
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d9771f5ec46c282d518b453c793635dbdc3a2a94 Mon Sep 17 00:00:00 2001
From: Xiao Ni <xni(a)redhat.com>
Date: Fri, 14 Jun 2019 15:41:05 -0700
Subject: [PATCH] raid5-cache: Need to do start() part job after adding journal
device
commit d5d885fd514f ("md: introduce new personality funciton start()")
splits the init job to two parts. The first part run() does the jobs that
do not require the md threads. The second part start() does the jobs that
require the md threads.
Now it just does run() in adding new journal device. It needs to do the
second part start() too.
Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
Cc: stable(a)vger.kernel.org #v4.9+
Reported-by: Michal Soltys <soltys(a)ziu.info>
Signed-off-by: Xiao Ni <xni(a)redhat.com>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b83bce2beb66..da94cbaa1a9e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7672,7 +7672,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
{
struct r5conf *conf = mddev->private;
- int err = -EEXIST;
+ int ret, err = -EEXIST;
int disk;
struct disk_info *p;
int first = 0;
@@ -7687,7 +7687,14 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
* The array is in readonly mode if journal is missing, so no
* write requests running. We should be safe
*/
- log_init(conf, rdev, false);
+ ret = log_init(conf, rdev, false);
+ if (ret)
+ return ret;
+
+ ret = r5l_start(conf->log);
+ if (ret)
+ return ret;
+
return 0;
}
if (mddev->recovery_disabled == conf->recovery_disabled)
v4.9.y to v5.1.y:
fs/btrfs/file.c: In function 'btrfs_punch_hole':
fs/btrfs/file.c:2787:27: error: invalid initializer
struct timespec64 now = current_time(inode);
^~~~~~~~~~~~
fs/btrfs/file.c:2790:18: error: incompatible types when assigning to type 'struct timespec' from type 'struct timespec64'
v4.19.y, v5.1.y:
fs/btrfs/props.c: In function 'prop_compression_validate':
fs/btrfs/props.c:369:6: error: implicit declaration of function 'btrfs_compression_is_valid_type'
My apologies for the noise if this has already been reported/fixed.
Guenter
From: Francois Buergisser <fbuergisser(a)chromium.org>
The Hantro codec is typically used in platforms with an IOMMU,
so we need to set a proper DMA segment size. Devices without an
IOMMU will still fallback to default 64KiB segments.
Cc: stable(a)vger.kernel.org
Fixes: 775fec69008d3 ("media: add Rockchip VPU JPEG encoder driver")
Signed-off-by: Francois Buergisser <fbuergisser(a)chromium.org>
Signed-off-by: Ezequiel Garcia <ezequiel(a)collabora.com>
---
drivers/staging/media/hantro/hantro_drv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c
index b71a06e9159e..4eae1dbb1ac8 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -731,6 +731,7 @@ static int hantro_probe(struct platform_device *pdev)
dev_err(vpu->dev, "Could not set DMA coherent mask.\n");
return ret;
}
+ vb2_dma_contig_set_max_seg_size(&pdev->dev, DMA_BIT_MASK(32));
for (i = 0; i < vpu->variant->num_irqs; i++) {
const char *irq_name = vpu->variant->irqs[i].name;
--
2.22.0
Hi,
When one request is dispatched to LLD via dm-rq, if the result is
BLK_STS_*RESOURCE, dm-rq will free the request. However, LLD may allocate
private data for this request, so this way will cause memory leak.
Add .cleanup_rq() callback and implement it in SCSI for fixing the issue,
since SCSI is the only driver which allocates private requst data in
.queue_rq() path.
Another use case of this callback is to free the request and re-submit
bios during cpu hotplug when the hctx is dead, see the following link:
https://lore.kernel.org/linux-block/f122e8f2-5ede-2d83-9ca0-bc713ce66d01@hu…
V3:
- run .cleanup_rq() from dm-rq because this issue is dm-rq specific,
and even in future it should be still very unusual to free request
in this way. If we call .cleanup_rq() in generic rq free code(fast
path), cost will be introduced unnecessarily, also we have to
consider related race.
V2:
- run .cleanup_rq() in blk_mq_free_request(), as suggested by Mike
Ming Lei (2):
blk-mq: add callback of .cleanup_rq
scsi: implement .cleanup_rq callback
drivers/md/dm-rq.c | 1 +
drivers/scsi/scsi_lib.c | 13 +++++++++++++
include/linux/blk-mq.h | 13 +++++++++++++
3 files changed, 27 insertions(+)
Cc: Ewan D. Milne <emilne(a)redhat.com>
Cc: Bart Van Assche <bvanassche(a)acm.org>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Mike Snitzer <snitzer(a)redhat.com>
Cc: dm-devel(a)redhat.com
Cc: <stable(a)vger.kernel.org>
Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
--
2.20.1
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 13b82b746310b51b064bc855993a1c84bf862726 Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Date: Wed, 22 May 2019 14:34:00 +0300
Subject: [PATCH] xhci: Fix immediate data transfer if buffer is already DMA
mapped
xhci immediate data transfer (IDT) support in 5.2-rc1 caused regression
on various Samsung Exynos boards with ASIX USB 2.0 ethernet dongle.
If the transfer buffer in the URB is already DMA mapped then IDT should
not be used. urb->transfer_dma will already contain a valid dma address,
and there is no guarantee the data in urb->transfer_buffer is valid.
The IDT support patch used urb->transfer_dma as a temporary storage,
copying data from urb->transfer_buffer into it.
Issue was solved by preventing IDT if transfer buffer is already dma
mapped, and by not using urb->transfer_dma as temporary storage.
Fixes: 33e39350ebd2 ("usb: xhci: add Immediate Data Transfer support")
Reported-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
CC: Nicolas Saenz Julienne <nsaenzjulienne(a)suse.de>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index ef7c8698772e..88392aa65722 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3432,11 +3432,14 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
if (urb->transfer_buffer_length > 0) {
u32 length_field, remainder;
+ u64 addr;
if (xhci_urb_suitable_for_idt(urb)) {
- memcpy(&urb->transfer_dma, urb->transfer_buffer,
+ memcpy(&addr, urb->transfer_buffer,
urb->transfer_buffer_length);
field |= TRB_IDT;
+ } else {
+ addr = (u64) urb->transfer_dma;
}
remainder = xhci_td_remainder(xhci, 0,
@@ -3449,8 +3452,8 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
if (setup->bRequestType & USB_DIR_IN)
field |= TRB_DIR_IN;
queue_trb(xhci, ep_ring, true,
- lower_32_bits(urb->transfer_dma),
- upper_32_bits(urb->transfer_dma),
+ lower_32_bits(addr),
+ upper_32_bits(addr),
length_field,
field | ep_ring->cycle_state);
}
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index a450a99e90eb..7f8b950d1a73 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -2160,7 +2160,8 @@ static inline bool xhci_urb_suitable_for_idt(struct urb *urb)
{
if (!usb_endpoint_xfer_isoc(&urb->ep->desc) && usb_urb_dir_out(urb) &&
usb_endpoint_maxp(&urb->ep->desc) >= TRB_IDT_MAX_SIZE &&
- urb->transfer_buffer_length <= TRB_IDT_MAX_SIZE)
+ urb->transfer_buffer_length <= TRB_IDT_MAX_SIZE &&
+ !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
return true;
return false;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4c6d80e1144bdf48cae6b602ae30d41f3e5c76a9 Mon Sep 17 00:00:00 2001
From: Norbert Manthey <nmanthey(a)amazon.de>
Date: Fri, 5 Jul 2019 15:06:00 +0200
Subject: [PATCH] pstore: Fix double-free in pstore_mkfile() failure path
The pstore_mkfile() function is passed a pointer to a struct
pstore_record. On success it consumes this 'record' pointer and
references it from the created inode.
On failure, however, it may or may not free the record. There are even
two different code paths which return -ENOMEM -- one of which does and
the other doesn't free the record.
Make the behaviour deterministic by never consuming and freeing the
record when returning failure, allowing the caller to do the cleanup
consistently.
Signed-off-by: Norbert Manthey <nmanthey(a)amazon.de>
Link: https://lore.kernel.org/r/1562331960-26198-1-git-send-email-nmanthey@amazon…
Fixes: 83f70f0769ddd ("pstore: Do not duplicate record metadata")
Fixes: 1dfff7dd67d1a ("pstore: Pass record contents instead of copying")
Cc: stable(a)vger.kernel.org
[kees: also move "private" allocation location, rename inode cleanup label]
Signed-off-by: Kees Cook <keescook(a)chromium.org>
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 89a80b568a17..7fbe8f058220 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -318,22 +318,21 @@ int pstore_mkfile(struct dentry *root, struct pstore_record *record)
goto fail;
inode->i_mode = S_IFREG | 0444;
inode->i_fop = &pstore_file_operations;
- private = kzalloc(sizeof(*private), GFP_KERNEL);
- if (!private)
- goto fail_alloc;
- private->record = record;
-
scnprintf(name, sizeof(name), "%s-%s-%llu%s",
pstore_type_to_name(record->type),
record->psi->name, record->id,
record->compressed ? ".enc.z" : "");
+ private = kzalloc(sizeof(*private), GFP_KERNEL);
+ if (!private)
+ goto fail_inode;
+
dentry = d_alloc_name(root, name);
if (!dentry)
goto fail_private;
+ private->record = record;
inode->i_size = private->total_size = size;
-
inode->i_private = private;
if (record->time.tv_sec)
@@ -349,7 +348,7 @@ int pstore_mkfile(struct dentry *root, struct pstore_record *record)
fail_private:
free_pstore_private(private);
-fail_alloc:
+fail_inode:
iput(inode);
fail:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4c6d80e1144bdf48cae6b602ae30d41f3e5c76a9 Mon Sep 17 00:00:00 2001
From: Norbert Manthey <nmanthey(a)amazon.de>
Date: Fri, 5 Jul 2019 15:06:00 +0200
Subject: [PATCH] pstore: Fix double-free in pstore_mkfile() failure path
The pstore_mkfile() function is passed a pointer to a struct
pstore_record. On success it consumes this 'record' pointer and
references it from the created inode.
On failure, however, it may or may not free the record. There are even
two different code paths which return -ENOMEM -- one of which does and
the other doesn't free the record.
Make the behaviour deterministic by never consuming and freeing the
record when returning failure, allowing the caller to do the cleanup
consistently.
Signed-off-by: Norbert Manthey <nmanthey(a)amazon.de>
Link: https://lore.kernel.org/r/1562331960-26198-1-git-send-email-nmanthey@amazon…
Fixes: 83f70f0769ddd ("pstore: Do not duplicate record metadata")
Fixes: 1dfff7dd67d1a ("pstore: Pass record contents instead of copying")
Cc: stable(a)vger.kernel.org
[kees: also move "private" allocation location, rename inode cleanup label]
Signed-off-by: Kees Cook <keescook(a)chromium.org>
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 89a80b568a17..7fbe8f058220 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -318,22 +318,21 @@ int pstore_mkfile(struct dentry *root, struct pstore_record *record)
goto fail;
inode->i_mode = S_IFREG | 0444;
inode->i_fop = &pstore_file_operations;
- private = kzalloc(sizeof(*private), GFP_KERNEL);
- if (!private)
- goto fail_alloc;
- private->record = record;
-
scnprintf(name, sizeof(name), "%s-%s-%llu%s",
pstore_type_to_name(record->type),
record->psi->name, record->id,
record->compressed ? ".enc.z" : "");
+ private = kzalloc(sizeof(*private), GFP_KERNEL);
+ if (!private)
+ goto fail_inode;
+
dentry = d_alloc_name(root, name);
if (!dentry)
goto fail_private;
+ private->record = record;
inode->i_size = private->total_size = size;
-
inode->i_private = private;
if (record->time.tv_sec)
@@ -349,7 +348,7 @@ int pstore_mkfile(struct dentry *root, struct pstore_record *record)
fail_private:
free_pstore_private(private);
-fail_alloc:
+fail_inode:
iput(inode);
fail:
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b091ac616846a1da75b1f2566b41255ce7f0e0a6 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Mon, 1 Jul 2019 14:09:17 +0900
Subject: [PATCH] sd_zbc: Fix report zones buffer allocation
During disk scan and revalidation done with sd_revalidate(), the zones
of a zoned disk are checked using the helper function
blk_revalidate_disk_zones() if a configuration change is detected
(change in the number of zones or zone size). The function
blk_revalidate_disk_zones() issues report_zones calls that are very
large, that is, to obtain zone information for all zones of the disk
with a single command. The size of the report zones command buffer
necessary for such large request generally is lower than the disk
max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no
memory fragmentation), but often fail at run time (e.g. hot-plug
event). This causes the disk revalidation to fail and the disk
capacity to be changed to 0.
This problem can be avoided by using vmalloc() instead of kmalloc() for
the buffer allocation. To limit the amount of memory to be allocated,
this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES
maximum number of zones to report with a single report zones command.
This limit may be lowered further to satisfy the disk max_hw_sectors
limit. Finally, to ensure that the vmalloc-ed buffer can always be
mapped in a request, the buffer size is further limited to at most
queue_max_segments() pages, allowing successful mapping of the buffer
even in the worst case scenario where none of the buffer pages are
contiguous.
Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a3748c ("block: add a report_zones method")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index ec3764c8f3f1..db16c19e05c4 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -9,6 +9,8 @@
*/
#include <linux/blkdev.h>
+#include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
#include <asm/unaligned.h>
@@ -50,7 +52,7 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
/**
* sd_zbc_do_report_zones - Issue a REPORT ZONES scsi command.
* @sdkp: The target disk
- * @buf: Buffer to use for the reply
+ * @buf: vmalloc-ed buffer to use for the reply
* @buflen: the buffer size
* @lba: Start LBA of the report
* @partial: Do partial report
@@ -79,7 +81,6 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
put_unaligned_be32(buflen, &cmd[10]);
if (partial)
cmd[14] = ZBC_REPORT_ZONE_PARTIAL;
- memset(buf, 0, buflen);
result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
buf, buflen, &sshdr,
@@ -103,6 +104,53 @@ static int sd_zbc_do_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
return 0;
}
+/*
+ * Maximum number of zones to get with one report zones command.
+ */
+#define SD_ZBC_REPORT_MAX_ZONES 8192U
+
+/**
+ * Allocate a buffer for report zones reply.
+ * @sdkp: The target disk
+ * @nr_zones: Maximum number of zones to report
+ * @buflen: Size of the buffer allocated
+ *
+ * Try to allocate a reply buffer for the number of requested zones.
+ * The size of the buffer allocated may be smaller than requested to
+ * satify the device constraint (max_hw_sectors, max_segments, etc).
+ *
+ * Return the address of the allocated buffer and update @buflen with
+ * the size of the allocated buffer.
+ */
+static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
+ unsigned int nr_zones, size_t *buflen)
+{
+ struct request_queue *q = sdkp->disk->queue;
+ size_t bufsize;
+ void *buf;
+
+ /*
+ * Report zone buffer size should be at most 64B times the number of
+ * zones requested plus the 64B reply header, but should be at least
+ * SECTOR_SIZE for ATA devices.
+ * Make sure that this size does not exceed the hardware capabilities.
+ * Furthermore, since the report zone command cannot be split, make
+ * sure that the allocated buffer can always be mapped by limiting the
+ * number of pages allocated to the HBA max segments limit.
+ */
+ nr_zones = min(nr_zones, SD_ZBC_REPORT_MAX_ZONES);
+ bufsize = roundup((nr_zones + 1) * 64, 512);
+ bufsize = min_t(size_t, bufsize,
+ queue_max_hw_sectors(q) << SECTOR_SHIFT);
+ bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT);
+
+ buf = vzalloc(bufsize);
+ if (buf)
+ *buflen = bufsize;
+
+ return buf;
+}
+
/**
* sd_zbc_report_zones - Disk report zones operation.
* @disk: The target disk
@@ -116,30 +164,23 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
struct blk_zone *zones, unsigned int *nr_zones)
{
struct scsi_disk *sdkp = scsi_disk(disk);
- unsigned int i, buflen, nrz = *nr_zones;
+ unsigned int i, nrz = *nr_zones;
unsigned char *buf;
- size_t offset = 0;
+ size_t buflen = 0, offset = 0;
int ret = 0;
if (!sd_is_zoned(sdkp))
/* Not a zoned device */
return -EOPNOTSUPP;
- /*
- * Get a reply buffer for the number of requested zones plus a header,
- * without exceeding the device maximum command size. For ATA disks,
- * buffers must be aligned to 512B.
- */
- buflen = min(queue_max_hw_sectors(disk->queue) << 9,
- roundup((nrz + 1) * 64, 512));
- buf = kmalloc(buflen, GFP_KERNEL);
+ buf = sd_zbc_alloc_report_buffer(sdkp, nrz, &buflen);
if (!buf)
return -ENOMEM;
ret = sd_zbc_do_report_zones(sdkp, buf, buflen,
sectors_to_logical(sdkp->device, sector), true);
if (ret)
- goto out_free_buf;
+ goto out;
nrz = min(nrz, get_unaligned_be32(&buf[0]) / 64);
for (i = 0; i < nrz; i++) {
@@ -150,8 +191,8 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
*nr_zones = nrz;
-out_free_buf:
- kfree(buf);
+out:
+ kvfree(buf);
return ret;
}
@@ -285,8 +326,6 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
return 0;
}
-#define SD_ZBC_BUF_SIZE 131072U
-
/**
* sd_zbc_check_zones - Check the device capacity and zone sizes
* @sdkp: Target disk
@@ -302,22 +341,28 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
*/
static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
{
+ size_t bufsize, buflen;
+ unsigned int noio_flag;
u64 zone_blocks = 0;
sector_t max_lba, block = 0;
unsigned char *buf;
unsigned char *rec;
- unsigned int buf_len;
- unsigned int list_length;
int ret;
u8 same;
+ /* Do all memory allocations as if GFP_NOIO was specified */
+ noio_flag = memalloc_noio_save();
+
/* Get a buffer */
- buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
- if (!buf)
- return -ENOMEM;
+ buf = sd_zbc_alloc_report_buffer(sdkp, SD_ZBC_REPORT_MAX_ZONES,
+ &bufsize);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
/* Do a report zone to get max_lba and the same field */
- ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0, false);
+ ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, 0, false);
if (ret)
goto out_free;
@@ -353,12 +398,12 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
do {
/* Parse REPORT ZONES header */
- list_length = get_unaligned_be32(&buf[0]) + 64;
+ buflen = min_t(size_t, get_unaligned_be32(&buf[0]) + 64,
+ bufsize);
rec = buf + 64;
- buf_len = min(list_length, SD_ZBC_BUF_SIZE);
/* Parse zone descriptors */
- while (rec < buf + buf_len) {
+ while (rec < buf + buflen) {
u64 this_zone_blocks = get_unaligned_be64(&rec[8]);
if (zone_blocks == 0) {
@@ -374,8 +419,8 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
}
if (block < sdkp->capacity) {
- ret = sd_zbc_do_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
- block, true);
+ ret = sd_zbc_do_report_zones(sdkp, buf, bufsize, block,
+ true);
if (ret)
goto out_free;
}
@@ -406,7 +451,8 @@ static int sd_zbc_check_zones(struct scsi_disk *sdkp, u32 *zblocks)
}
out_free:
- kfree(buf);
+ memalloc_noio_restore(noio_flag);
+ kvfree(buf);
return ret;
}
From: Stephane Grosjean <s.grosjean(a)peak-system.com>
When closing the CAN device while tx skbs are inflight, echo skb could
be released twice. By calling close_candev() before unlinking all
pending tx urbs, then the internal echo_skb[] array is fully and
correctly cleared before the USB write callback and, therefore,
can_get_echo_skb() are called, for each aborted URB.
Fixes: bb4785551f64 ("can: usb: PEAK-System Technik USB adapters driver core")
Signed-off-by: Stephane Grosjean <s.grosjean(a)peak-system.com>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
index 458154c9b482..22b9c8e6d040 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
@@ -568,16 +568,16 @@ static int peak_usb_ndo_stop(struct net_device *netdev)
dev->state &= ~PCAN_USB_STATE_STARTED;
netif_stop_queue(netdev);
+ close_candev(netdev);
+
+ dev->can.state = CAN_STATE_STOPPED;
+
/* unlink all pending urbs and free used memory */
peak_usb_unlink_all_urbs(dev);
if (dev->adapter->dev_stop)
dev->adapter->dev_stop(dev);
- close_candev(netdev);
-
- dev->can.state = CAN_STATE_STOPPED;
-
/* can set bus off now */
if (dev->adapter->dev_set_bus) {
int err = dev->adapter->dev_set_bus(dev, 0);
--
2.20.1
From: Nikita Yushchenko <nikita.yoush(a)cogentembedded.com>
We have observed rcar_canfd driver entering IRQ storm under high load,
with following scenario:
- rcar_canfd_global_interrupt() in entered due to Rx available,
- napi_schedule_prep() is called, and sets NAPIF_STATE_SCHED in state
- Rx fifo interrupts are masked,
- rcar_canfd_global_interrupt() is entered again, this time due to
error interrupt (e.g. due to overflow),
- since scheduled napi poller has not yet executed, condition for calling
napi_schedule_prep() from rcar_canfd_global_interrupt() remains true,
thus napi_schedule_prep() gets called and sets NAPIF_STATE_MISSED flag
in state,
- later, napi poller function rcar_canfd_rx_poll() gets executed, and
calls napi_complete_done(),
- due to NAPIF_STATE_MISSED flag in state, this call does not clear
NAPIF_STATE_SCHED flag from state,
- on return from napi_complete_done(), rcar_canfd_rx_poll() unmasks Rx
interrutps,
- Rx interrupt happens, rcar_canfd_global_interrupt() gets called
and calls napi_schedule_prep(),
- since NAPIF_STATE_SCHED is set in state at this time, this call
returns false,
- due to that false return, rcar_canfd_global_interrupt() returns
without masking Rx interrupt
- and this results into IRQ storm: unmasked Rx interrupt happens again
and again is misprocessed in the same way.
This patch fixes that scenario by unmasking Rx interrupts only when
napi_complete_done() returns true, which means it has cleared
NAPIF_STATE_SCHED in state.
Fixes: dd3bd23eb438 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver")
Signed-off-by: Nikita Yushchenko <nikita.yoush(a)cogentembedded.com>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/rcar/rcar_canfd.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/can/rcar/rcar_canfd.c b/drivers/net/can/rcar/rcar_canfd.c
index 05410008aa6b..de34a4b82d4a 100644
--- a/drivers/net/can/rcar/rcar_canfd.c
+++ b/drivers/net/can/rcar/rcar_canfd.c
@@ -1508,10 +1508,11 @@ static int rcar_canfd_rx_poll(struct napi_struct *napi, int quota)
/* All packets processed */
if (num_pkts < quota) {
- napi_complete_done(napi, num_pkts);
- /* Enable Rx FIFO interrupts */
- rcar_canfd_set_bit(priv->base, RCANFD_RFCC(ridx),
- RCANFD_RFCC_RFIE);
+ if (napi_complete_done(napi, num_pkts)) {
+ /* Enable Rx FIFO interrupts */
+ rcar_canfd_set_bit(priv->base, RCANFD_RFCC(ridx),
+ RCANFD_RFCC_RFIE);
+ }
}
return num_pkts;
}
--
2.20.1
I'm announcing the release of the 3.16.71 kernel.
All users of the 3.16 kernel series should upgrade.
The updated 3.16.y git tree can be found at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-3.16.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git
The diff from 3.16.70 is attached to this message.
Ben.
------------
Makefile | 2 +-
kernel/ptrace.c | 4 +---
2 files changed, 2 insertions(+), 4 deletions(-)
Ben Hutchings (1):
Linux 3.16.71
Jann Horn (1):
ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME
--
Ben Hutchings
compatible: Gracefully accepts erroneous data from any source
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
commit ed527b13d800dd515a9e6c582f0a73eca65b2e1b upstream.
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
[ Horia: backported to 4.9 ]
Signed-off-by: Horia Geantă <horia.geanta(a)nxp.com>
---
drivers/crypto/caam/caamalg.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 88caca3370f2..f8ac768ed5d7 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -2015,6 +2015,7 @@ static void ablkcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct ablkcipher_request *req = context;
struct ablkcipher_edesc *edesc;
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
int ivsize = crypto_ablkcipher_ivsize(ablkcipher);
#ifdef DEBUG
@@ -2040,10 +2041,11 @@ static void ablkcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->info) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- scatterwalk_map_and_copy(req->info, req->dst, req->nbytes - ivsize,
- ivsize, 0);
+ if ((ctx->class1_alg_type & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
+ scatterwalk_map_and_copy(req->info, req->dst, req->nbytes -
+ ivsize, ivsize, 0);
kfree(edesc);
@@ -2056,6 +2058,7 @@ static void ablkcipher_decrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct ablkcipher_request *req = context;
struct ablkcipher_edesc *edesc;
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
int ivsize = crypto_ablkcipher_ivsize(ablkcipher);
#ifdef DEBUG
@@ -2080,10 +2083,11 @@ static void ablkcipher_decrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->info) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- scatterwalk_map_and_copy(req->info, req->src, req->nbytes - ivsize,
- ivsize, 0);
+ if ((ctx->class1_alg_type & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
+ scatterwalk_map_and_copy(req->info, req->src, req->nbytes -
+ ivsize, ivsize, 0);
kfree(edesc);
--
2.17.1
On Wed, 24 Jul 2019 08:44:19 +0200
Christian Borntraeger <borntraeger(a)de.ibm.com> wrote:
>
>
> On 24.07.19 00:58, Halil Pasic wrote:
> > The access to airq_areas was racy ever since the adapter interrupts got
> > introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390:
> > make airq summary indicators DMA") this became an issue in practice as
> > well. Namely before that commit the airq_info that got overwritten was
> > still functional. After that commit however the two infos share a
> > summary_indicator, which aggravates the situation. Which means
> > auto-online mechanism occasionally hangs the boot with virtio_blk.
> >
> > Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
> > Reported-by: Marc Hartmayer <mhartmay(a)linux.ibm.com>
> > Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.")
> > ---
> > * We need definitely this fixed for 5.3. For older stable kernels it is
> > to be discussed. @Connie what do you think: do we need a cc stable?
>
> Unless you can prove that the problem could never happen on old version
> we absolutely do need cc stable.
No I would not like to make an attempt at proving that. I prefer code
race free anyway. CC-ing stable.
>
> >
> > * I have a variant that does not need the extra mutex but uses cmpxchg().
> > Decided to post this one because that one is more complex. But if there
> > is interest we can have a look at it as well.
>
> This is slow path (startup) and never called in hot path. Correct? Mutex should be
> fine.
Right, this is only relevant during device initialization, which is an
infrequent operation.
Thanks,
Halil
> > ---
> > drivers/s390/virtio/virtio_ccw.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> > index 1a55e5942d36..d97742662755 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -145,6 +145,8 @@ struct airq_info {
> > struct airq_iv *aiv;
> > };
> > static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
> > +DEFINE_MUTEX(airq_areas_lock);
> > +
> > static u8 *summary_indicators;
> >
> > static inline u8 *get_summary_indicator(struct airq_info *info)
> > @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue *vqs[], int nvqs,
> > unsigned long bit, flags;
> >
> > for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) {
> > + mutex_lock(&airq_areas_lock);
> > if (!airq_areas[i])
> > airq_areas[i] = new_airq_info(i);
> > info = airq_areas[i];
> > + mutex_unlock(&airq_areas_lock);
> > if (!info)
> > return 0;
> > write_lock_irqsave(&info->lock, flags);
> >
>
This is a note to let you know that I've just added the patch titled
fpga-manager: altera-ps-spi: Fix build error
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 3d139703d397f6281368047ba7ad1c8bf95aa8ab Mon Sep 17 00:00:00 2001
From: YueHaibing <yuehaibing(a)huawei.com>
Date: Mon, 8 Jul 2019 15:13:56 +0800
Subject: fpga-manager: altera-ps-spi: Fix build error
If BITREVERSE is m and FPGA_MGR_ALTERA_PS_SPI is y,
build fails:
drivers/fpga/altera-ps-spi.o: In function `altera_ps_write':
altera-ps-spi.c:(.text+0x4ec): undefined reference to `byte_rev_table'
Select BITREVERSE to fix this.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: fcfe18f885f6 ("fpga-manager: altera-ps-spi: use bitrev8x4")
Signed-off-by: YueHaibing <yuehaibing(a)huawei.com>
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Moritz Fischer <mdf(a)kernel.org>
Link: https://lore.kernel.org/r/20190708071356.50928-1-yuehaibing@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/fpga/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
index 474f304ec109..cdd4f73b4869 100644
--- a/drivers/fpga/Kconfig
+++ b/drivers/fpga/Kconfig
@@ -40,6 +40,7 @@ config ALTERA_PR_IP_CORE_PLAT
config FPGA_MGR_ALTERA_PS_SPI
tristate "Altera FPGA Passive Serial over SPI"
depends on SPI
+ select BITREVERSE
help
FPGA manager driver support for Altera Arria/Cyclone/Stratix
using the passive serial interface over SPI.
--
2.22.0
This is a note to let you know that I've just added the patch titled
mei: me: add mule creek canyon (EHL) device ids
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 1be8624a0cbef720e8da39a15971e01abffc865b Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Fri, 12 Jul 2019 12:58:14 +0300
Subject: mei: me: add mule creek canyon (EHL) device ids
Add Mule Creek Canyon (PCH) MEI device ids for Elkhart Lake (EHL) Platform.
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190712095814.20746-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/hw-me-regs.h | 3 +++
drivers/misc/mei/pci-me.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/drivers/misc/mei/hw-me-regs.h b/drivers/misc/mei/hw-me-regs.h
index d74b182e19f3..6c0173772162 100644
--- a/drivers/misc/mei/hw-me-regs.h
+++ b/drivers/misc/mei/hw-me-regs.h
@@ -81,6 +81,9 @@
#define MEI_DEV_ID_ICP_LP 0x34E0 /* Ice Lake Point LP */
+#define MEI_DEV_ID_MCC 0x4B70 /* Mule Creek Canyon (EHL) */
+#define MEI_DEV_ID_MCC_4 0x4B75 /* Mule Creek Canyon 4 (EHL) */
+
/*
* MEI HW Section
*/
diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index 7a2b3545a7f9..57cb68f5cc64 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -98,6 +98,9 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
{MEI_PCI_DEVICE(MEI_DEV_ID_ICP_LP, MEI_ME_PCH12_CFG)},
+ {MEI_PCI_DEVICE(MEI_DEV_ID_MCC, MEI_ME_PCH12_CFG)},
+ {MEI_PCI_DEVICE(MEI_DEV_ID_MCC_4, MEI_ME_PCH8_CFG)},
+
/* required last entry */
{0, }
};
--
2.22.0
stable rc 4.14 stable rc i386 and armv7 builds failed due to below error.
fs/btrfs/file.c: In function 'btrfs_punch_hole':
fs/btrfs/file.c:2787:27: error: invalid initializer
struct timespec64 now = current_time(inode);
^~~~~~~~~~~~
fs/btrfs/file.c:2790:18: error: incompatible types when assigning to
type 'struct timespec' from type 'struct timespec64'
inode->i_mtime = now;
^
fs/btrfs/file.c:2791:18: error: incompatible types when assigning to
type 'struct timespec' from type 'struct timespec64'
inode->i_ctime = now;
^
Best regards
Naresh Kamboju
This is a note to let you know that I've just added the patch titled
binder: prevent transactions to context manager from its own process.
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 49ed96943a8e0c62cc5a9b0a6cfc88be87d1fcec Mon Sep 17 00:00:00 2001
From: Hridya Valsaraju <hridya(a)google.com>
Date: Mon, 15 Jul 2019 12:18:04 -0700
Subject: binder: prevent transactions to context manager from its own process.
Currently, a transaction to context manager from its own process
is prevented by checking if its binder_proc struct is the same as
that of the sender. However, this would not catch cases where the
process opens the binder device again and uses the new fd to send
a transaction to the context manager.
Reported-by: syzbot+8b3c354d33c4ac78bfad(a)syzkaller.appspotmail.com
Signed-off-by: Hridya Valsaraju <hridya(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190715191804.112933-1-hridya@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/android/binder.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 5bde08603fbc..dc1c83eafc22 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2988,7 +2988,7 @@ static void binder_transaction(struct binder_proc *proc,
else
return_error = BR_DEAD_REPLY;
mutex_unlock(&context->context_mgr_node_lock);
- if (target_node && target_proc == proc) {
+ if (target_node && target_proc->pid == proc->pid) {
binder_user_error("%d:%d got transaction to context manager from process owning it\n",
proc->pid, thread->pid);
return_error = BR_FAILED_REPLY;
--
2.22.0
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From aa53e3bfac7205fb3a8815ac1c937fd6ed01b41e Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthumshirn(a)suse.de>
Date: Thu, 6 Jun 2019 12:07:15 +0200
Subject: [PATCH] btrfs: correctly validate compression type
Nikolay reported the following KASAN splat when running btrfs/048:
[ 1843.470920] ==================================================================
[ 1843.471971] BUG: KASAN: slab-out-of-bounds in strncmp+0x66/0xb0
[ 1843.472775] Read of size 1 at addr ffff888111e369e2 by task btrfs/3979
[ 1843.473904] CPU: 3 PID: 3979 Comm: btrfs Not tainted 5.2.0-rc3-default #536
[ 1843.475009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 1843.476322] Call Trace:
[ 1843.476674] dump_stack+0x7c/0xbb
[ 1843.477132] ? strncmp+0x66/0xb0
[ 1843.477587] print_address_description+0x114/0x320
[ 1843.478256] ? strncmp+0x66/0xb0
[ 1843.478740] ? strncmp+0x66/0xb0
[ 1843.479185] __kasan_report+0x14e/0x192
[ 1843.479759] ? strncmp+0x66/0xb0
[ 1843.480209] kasan_report+0xe/0x20
[ 1843.480679] strncmp+0x66/0xb0
[ 1843.481105] prop_compression_validate+0x24/0x70
[ 1843.481798] btrfs_xattr_handler_set_prop+0x65/0x160
[ 1843.482509] __vfs_setxattr+0x71/0x90
[ 1843.483012] __vfs_setxattr_noperm+0x84/0x130
[ 1843.483606] vfs_setxattr+0xac/0xb0
[ 1843.484085] setxattr+0x18c/0x230
[ 1843.484546] ? vfs_setxattr+0xb0/0xb0
[ 1843.485048] ? __mod_node_page_state+0x1f/0xa0
[ 1843.485672] ? _raw_spin_unlock+0x24/0x40
[ 1843.486233] ? __handle_mm_fault+0x988/0x1290
[ 1843.486823] ? lock_acquire+0xb4/0x1e0
[ 1843.487330] ? lock_acquire+0xb4/0x1e0
[ 1843.487842] ? mnt_want_write_file+0x3c/0x80
[ 1843.488442] ? debug_lockdep_rcu_enabled+0x22/0x40
[ 1843.489089] ? rcu_sync_lockdep_assert+0xe/0x70
[ 1843.489707] ? __sb_start_write+0x158/0x200
[ 1843.490278] ? mnt_want_write_file+0x3c/0x80
[ 1843.490855] ? __mnt_want_write+0x98/0xe0
[ 1843.491397] __x64_sys_fsetxattr+0xba/0xe0
[ 1843.492201] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1843.493201] do_syscall_64+0x6c/0x230
[ 1843.493988] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1843.495041] RIP: 0033:0x7fa7a8a7707a
[ 1843.495819] Code: 48 8b 0d 21 de 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 be 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee dd 2b 00 f7 d8 64 89 01 48
[ 1843.499203] RSP: 002b:00007ffcb73bca38 EFLAGS: 00000202 ORIG_RAX: 00000000000000be
[ 1843.500210] RAX: ffffffffffffffda RBX: 00007ffcb73bda9d RCX: 00007fa7a8a7707a
[ 1843.501170] RDX: 00007ffcb73bda9d RSI: 00000000006dc050 RDI: 0000000000000003
[ 1843.502152] RBP: 00000000006dc050 R08: 0000000000000000 R09: 0000000000000000
[ 1843.503109] R10: 0000000000000002 R11: 0000000000000202 R12: 00007ffcb73bda91
[ 1843.504055] R13: 0000000000000003 R14: 00007ffcb73bda82 R15: ffffffffffffffff
[ 1843.505268] Allocated by task 3979:
[ 1843.505771] save_stack+0x19/0x80
[ 1843.506211] __kasan_kmalloc.constprop.5+0xa0/0xd0
[ 1843.506836] setxattr+0xeb/0x230
[ 1843.507264] __x64_sys_fsetxattr+0xba/0xe0
[ 1843.507886] do_syscall_64+0x6c/0x230
[ 1843.508429] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1843.509558] Freed by task 0:
[ 1843.510188] (stack is not available)
[ 1843.511309] The buggy address belongs to the object at ffff888111e369e0
which belongs to the cache kmalloc-8 of size 8
[ 1843.514095] The buggy address is located 2 bytes inside of
8-byte region [ffff888111e369e0, ffff888111e369e8)
[ 1843.516524] The buggy address belongs to the page:
[ 1843.517561] page:ffff88813f478d80 refcount:1 mapcount:0 mapping:ffff88811940c300 index:0xffff888111e373b8 compound_mapcount: 0
[ 1843.519993] flags: 0x4404000010200(slab|head)
[ 1843.520951] raw: 0004404000010200 ffff88813f48b008 ffff888119403d50 ffff88811940c300
[ 1843.522616] raw: ffff888111e373b8 000000000016000f 00000001ffffffff 0000000000000000
[ 1843.524281] page dumped because: kasan: bad access detected
[ 1843.525936] Memory state around the buggy address:
[ 1843.526975] ffff888111e36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.528479] ffff888111e36900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.530138] >ffff888111e36980: fc fc fc fc fc fc fc fc fc fc fc fc 02 fc fc fc
[ 1843.531877] ^
[ 1843.533287] ffff888111e36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.534874] ffff888111e36a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.536468] ==================================================================
This is caused by supplying a too short compression value ('lz') in the
test-case and comparing it to 'lzo' with strncmp() and a length of 3.
strncmp() read past the 'lz' when looking for the 'o' and thus caused an
out-of-bounds read.
Introduce a new check 'btrfs_compress_is_valid_type()' which not only
checks the user-supplied value against known compression types, but also
employs checks for too short values.
Reported-by: Nikolay Borisov <nborisov(a)suse.com>
Fixes: 272e5326c783 ("btrfs: prop: fix vanished compression property after failed set")
CC: stable(a)vger.kernel.org # 5.1+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn(a)suse.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 66e21a4e9ea2..db41315f11eb 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -43,6 +43,22 @@ const char* btrfs_compress_type2str(enum btrfs_compression_type type)
return NULL;
}
+bool btrfs_compress_is_valid_type(const char *str, size_t len)
+{
+ int i;
+
+ for (i = 1; i < ARRAY_SIZE(btrfs_compress_types); i++) {
+ size_t comp_len = strlen(btrfs_compress_types[i]);
+
+ if (len < comp_len)
+ continue;
+
+ if (!strncmp(btrfs_compress_types[i], str, comp_len))
+ return true;
+ }
+ return false;
+}
+
static int btrfs_decompress_bio(struct compressed_bio *cb);
static inline int compressed_bio_size(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 191e5f4e3523..2035b8eb1290 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -173,6 +173,7 @@ extern const struct btrfs_compress_op btrfs_lzo_compress;
extern const struct btrfs_compress_op btrfs_zstd_compress;
const char* btrfs_compress_type2str(enum btrfs_compression_type type);
+bool btrfs_compress_is_valid_type(const char *str, size_t len);
int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index a9e2e66152ee..af109c0ba720 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -257,11 +257,7 @@ static int prop_compression_validate(const char *value, size_t len)
if (!value)
return 0;
- if (!strncmp("lzo", value, 3))
- return 0;
- else if (!strncmp("zlib", value, 4))
- return 0;
- else if (!strncmp("zstd", value, 4))
+ if (btrfs_compress_is_valid_type(value, len))
return 0;
return -EINVAL;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From aa53e3bfac7205fb3a8815ac1c937fd6ed01b41e Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthumshirn(a)suse.de>
Date: Thu, 6 Jun 2019 12:07:15 +0200
Subject: [PATCH] btrfs: correctly validate compression type
Nikolay reported the following KASAN splat when running btrfs/048:
[ 1843.470920] ==================================================================
[ 1843.471971] BUG: KASAN: slab-out-of-bounds in strncmp+0x66/0xb0
[ 1843.472775] Read of size 1 at addr ffff888111e369e2 by task btrfs/3979
[ 1843.473904] CPU: 3 PID: 3979 Comm: btrfs Not tainted 5.2.0-rc3-default #536
[ 1843.475009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 1843.476322] Call Trace:
[ 1843.476674] dump_stack+0x7c/0xbb
[ 1843.477132] ? strncmp+0x66/0xb0
[ 1843.477587] print_address_description+0x114/0x320
[ 1843.478256] ? strncmp+0x66/0xb0
[ 1843.478740] ? strncmp+0x66/0xb0
[ 1843.479185] __kasan_report+0x14e/0x192
[ 1843.479759] ? strncmp+0x66/0xb0
[ 1843.480209] kasan_report+0xe/0x20
[ 1843.480679] strncmp+0x66/0xb0
[ 1843.481105] prop_compression_validate+0x24/0x70
[ 1843.481798] btrfs_xattr_handler_set_prop+0x65/0x160
[ 1843.482509] __vfs_setxattr+0x71/0x90
[ 1843.483012] __vfs_setxattr_noperm+0x84/0x130
[ 1843.483606] vfs_setxattr+0xac/0xb0
[ 1843.484085] setxattr+0x18c/0x230
[ 1843.484546] ? vfs_setxattr+0xb0/0xb0
[ 1843.485048] ? __mod_node_page_state+0x1f/0xa0
[ 1843.485672] ? _raw_spin_unlock+0x24/0x40
[ 1843.486233] ? __handle_mm_fault+0x988/0x1290
[ 1843.486823] ? lock_acquire+0xb4/0x1e0
[ 1843.487330] ? lock_acquire+0xb4/0x1e0
[ 1843.487842] ? mnt_want_write_file+0x3c/0x80
[ 1843.488442] ? debug_lockdep_rcu_enabled+0x22/0x40
[ 1843.489089] ? rcu_sync_lockdep_assert+0xe/0x70
[ 1843.489707] ? __sb_start_write+0x158/0x200
[ 1843.490278] ? mnt_want_write_file+0x3c/0x80
[ 1843.490855] ? __mnt_want_write+0x98/0xe0
[ 1843.491397] __x64_sys_fsetxattr+0xba/0xe0
[ 1843.492201] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1843.493201] do_syscall_64+0x6c/0x230
[ 1843.493988] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1843.495041] RIP: 0033:0x7fa7a8a7707a
[ 1843.495819] Code: 48 8b 0d 21 de 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 be 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee dd 2b 00 f7 d8 64 89 01 48
[ 1843.499203] RSP: 002b:00007ffcb73bca38 EFLAGS: 00000202 ORIG_RAX: 00000000000000be
[ 1843.500210] RAX: ffffffffffffffda RBX: 00007ffcb73bda9d RCX: 00007fa7a8a7707a
[ 1843.501170] RDX: 00007ffcb73bda9d RSI: 00000000006dc050 RDI: 0000000000000003
[ 1843.502152] RBP: 00000000006dc050 R08: 0000000000000000 R09: 0000000000000000
[ 1843.503109] R10: 0000000000000002 R11: 0000000000000202 R12: 00007ffcb73bda91
[ 1843.504055] R13: 0000000000000003 R14: 00007ffcb73bda82 R15: ffffffffffffffff
[ 1843.505268] Allocated by task 3979:
[ 1843.505771] save_stack+0x19/0x80
[ 1843.506211] __kasan_kmalloc.constprop.5+0xa0/0xd0
[ 1843.506836] setxattr+0xeb/0x230
[ 1843.507264] __x64_sys_fsetxattr+0xba/0xe0
[ 1843.507886] do_syscall_64+0x6c/0x230
[ 1843.508429] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1843.509558] Freed by task 0:
[ 1843.510188] (stack is not available)
[ 1843.511309] The buggy address belongs to the object at ffff888111e369e0
which belongs to the cache kmalloc-8 of size 8
[ 1843.514095] The buggy address is located 2 bytes inside of
8-byte region [ffff888111e369e0, ffff888111e369e8)
[ 1843.516524] The buggy address belongs to the page:
[ 1843.517561] page:ffff88813f478d80 refcount:1 mapcount:0 mapping:ffff88811940c300 index:0xffff888111e373b8 compound_mapcount: 0
[ 1843.519993] flags: 0x4404000010200(slab|head)
[ 1843.520951] raw: 0004404000010200 ffff88813f48b008 ffff888119403d50 ffff88811940c300
[ 1843.522616] raw: ffff888111e373b8 000000000016000f 00000001ffffffff 0000000000000000
[ 1843.524281] page dumped because: kasan: bad access detected
[ 1843.525936] Memory state around the buggy address:
[ 1843.526975] ffff888111e36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.528479] ffff888111e36900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.530138] >ffff888111e36980: fc fc fc fc fc fc fc fc fc fc fc fc 02 fc fc fc
[ 1843.531877] ^
[ 1843.533287] ffff888111e36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.534874] ffff888111e36a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.536468] ==================================================================
This is caused by supplying a too short compression value ('lz') in the
test-case and comparing it to 'lzo' with strncmp() and a length of 3.
strncmp() read past the 'lz' when looking for the 'o' and thus caused an
out-of-bounds read.
Introduce a new check 'btrfs_compress_is_valid_type()' which not only
checks the user-supplied value against known compression types, but also
employs checks for too short values.
Reported-by: Nikolay Borisov <nborisov(a)suse.com>
Fixes: 272e5326c783 ("btrfs: prop: fix vanished compression property after failed set")
CC: stable(a)vger.kernel.org # 5.1+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn(a)suse.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 66e21a4e9ea2..db41315f11eb 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -43,6 +43,22 @@ const char* btrfs_compress_type2str(enum btrfs_compression_type type)
return NULL;
}
+bool btrfs_compress_is_valid_type(const char *str, size_t len)
+{
+ int i;
+
+ for (i = 1; i < ARRAY_SIZE(btrfs_compress_types); i++) {
+ size_t comp_len = strlen(btrfs_compress_types[i]);
+
+ if (len < comp_len)
+ continue;
+
+ if (!strncmp(btrfs_compress_types[i], str, comp_len))
+ return true;
+ }
+ return false;
+}
+
static int btrfs_decompress_bio(struct compressed_bio *cb);
static inline int compressed_bio_size(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 191e5f4e3523..2035b8eb1290 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -173,6 +173,7 @@ extern const struct btrfs_compress_op btrfs_lzo_compress;
extern const struct btrfs_compress_op btrfs_zstd_compress;
const char* btrfs_compress_type2str(enum btrfs_compression_type type);
+bool btrfs_compress_is_valid_type(const char *str, size_t len);
int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index a9e2e66152ee..af109c0ba720 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -257,11 +257,7 @@ static int prop_compression_validate(const char *value, size_t len)
if (!value)
return 0;
- if (!strncmp("lzo", value, 3))
- return 0;
- else if (!strncmp("zlib", value, 4))
- return 0;
- else if (!strncmp("zstd", value, 4))
+ if (btrfs_compress_is_valid_type(value, len))
return 0;
return -EINVAL;
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From aa53e3bfac7205fb3a8815ac1c937fd6ed01b41e Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthumshirn(a)suse.de>
Date: Thu, 6 Jun 2019 12:07:15 +0200
Subject: [PATCH] btrfs: correctly validate compression type
Nikolay reported the following KASAN splat when running btrfs/048:
[ 1843.470920] ==================================================================
[ 1843.471971] BUG: KASAN: slab-out-of-bounds in strncmp+0x66/0xb0
[ 1843.472775] Read of size 1 at addr ffff888111e369e2 by task btrfs/3979
[ 1843.473904] CPU: 3 PID: 3979 Comm: btrfs Not tainted 5.2.0-rc3-default #536
[ 1843.475009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 1843.476322] Call Trace:
[ 1843.476674] dump_stack+0x7c/0xbb
[ 1843.477132] ? strncmp+0x66/0xb0
[ 1843.477587] print_address_description+0x114/0x320
[ 1843.478256] ? strncmp+0x66/0xb0
[ 1843.478740] ? strncmp+0x66/0xb0
[ 1843.479185] __kasan_report+0x14e/0x192
[ 1843.479759] ? strncmp+0x66/0xb0
[ 1843.480209] kasan_report+0xe/0x20
[ 1843.480679] strncmp+0x66/0xb0
[ 1843.481105] prop_compression_validate+0x24/0x70
[ 1843.481798] btrfs_xattr_handler_set_prop+0x65/0x160
[ 1843.482509] __vfs_setxattr+0x71/0x90
[ 1843.483012] __vfs_setxattr_noperm+0x84/0x130
[ 1843.483606] vfs_setxattr+0xac/0xb0
[ 1843.484085] setxattr+0x18c/0x230
[ 1843.484546] ? vfs_setxattr+0xb0/0xb0
[ 1843.485048] ? __mod_node_page_state+0x1f/0xa0
[ 1843.485672] ? _raw_spin_unlock+0x24/0x40
[ 1843.486233] ? __handle_mm_fault+0x988/0x1290
[ 1843.486823] ? lock_acquire+0xb4/0x1e0
[ 1843.487330] ? lock_acquire+0xb4/0x1e0
[ 1843.487842] ? mnt_want_write_file+0x3c/0x80
[ 1843.488442] ? debug_lockdep_rcu_enabled+0x22/0x40
[ 1843.489089] ? rcu_sync_lockdep_assert+0xe/0x70
[ 1843.489707] ? __sb_start_write+0x158/0x200
[ 1843.490278] ? mnt_want_write_file+0x3c/0x80
[ 1843.490855] ? __mnt_want_write+0x98/0xe0
[ 1843.491397] __x64_sys_fsetxattr+0xba/0xe0
[ 1843.492201] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1843.493201] do_syscall_64+0x6c/0x230
[ 1843.493988] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1843.495041] RIP: 0033:0x7fa7a8a7707a
[ 1843.495819] Code: 48 8b 0d 21 de 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 be 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee dd 2b 00 f7 d8 64 89 01 48
[ 1843.499203] RSP: 002b:00007ffcb73bca38 EFLAGS: 00000202 ORIG_RAX: 00000000000000be
[ 1843.500210] RAX: ffffffffffffffda RBX: 00007ffcb73bda9d RCX: 00007fa7a8a7707a
[ 1843.501170] RDX: 00007ffcb73bda9d RSI: 00000000006dc050 RDI: 0000000000000003
[ 1843.502152] RBP: 00000000006dc050 R08: 0000000000000000 R09: 0000000000000000
[ 1843.503109] R10: 0000000000000002 R11: 0000000000000202 R12: 00007ffcb73bda91
[ 1843.504055] R13: 0000000000000003 R14: 00007ffcb73bda82 R15: ffffffffffffffff
[ 1843.505268] Allocated by task 3979:
[ 1843.505771] save_stack+0x19/0x80
[ 1843.506211] __kasan_kmalloc.constprop.5+0xa0/0xd0
[ 1843.506836] setxattr+0xeb/0x230
[ 1843.507264] __x64_sys_fsetxattr+0xba/0xe0
[ 1843.507886] do_syscall_64+0x6c/0x230
[ 1843.508429] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1843.509558] Freed by task 0:
[ 1843.510188] (stack is not available)
[ 1843.511309] The buggy address belongs to the object at ffff888111e369e0
which belongs to the cache kmalloc-8 of size 8
[ 1843.514095] The buggy address is located 2 bytes inside of
8-byte region [ffff888111e369e0, ffff888111e369e8)
[ 1843.516524] The buggy address belongs to the page:
[ 1843.517561] page:ffff88813f478d80 refcount:1 mapcount:0 mapping:ffff88811940c300 index:0xffff888111e373b8 compound_mapcount: 0
[ 1843.519993] flags: 0x4404000010200(slab|head)
[ 1843.520951] raw: 0004404000010200 ffff88813f48b008 ffff888119403d50 ffff88811940c300
[ 1843.522616] raw: ffff888111e373b8 000000000016000f 00000001ffffffff 0000000000000000
[ 1843.524281] page dumped because: kasan: bad access detected
[ 1843.525936] Memory state around the buggy address:
[ 1843.526975] ffff888111e36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.528479] ffff888111e36900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.530138] >ffff888111e36980: fc fc fc fc fc fc fc fc fc fc fc fc 02 fc fc fc
[ 1843.531877] ^
[ 1843.533287] ffff888111e36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.534874] ffff888111e36a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1843.536468] ==================================================================
This is caused by supplying a too short compression value ('lz') in the
test-case and comparing it to 'lzo' with strncmp() and a length of 3.
strncmp() read past the 'lz' when looking for the 'o' and thus caused an
out-of-bounds read.
Introduce a new check 'btrfs_compress_is_valid_type()' which not only
checks the user-supplied value against known compression types, but also
employs checks for too short values.
Reported-by: Nikolay Borisov <nborisov(a)suse.com>
Fixes: 272e5326c783 ("btrfs: prop: fix vanished compression property after failed set")
CC: stable(a)vger.kernel.org # 5.1+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn(a)suse.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 66e21a4e9ea2..db41315f11eb 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -43,6 +43,22 @@ const char* btrfs_compress_type2str(enum btrfs_compression_type type)
return NULL;
}
+bool btrfs_compress_is_valid_type(const char *str, size_t len)
+{
+ int i;
+
+ for (i = 1; i < ARRAY_SIZE(btrfs_compress_types); i++) {
+ size_t comp_len = strlen(btrfs_compress_types[i]);
+
+ if (len < comp_len)
+ continue;
+
+ if (!strncmp(btrfs_compress_types[i], str, comp_len))
+ return true;
+ }
+ return false;
+}
+
static int btrfs_decompress_bio(struct compressed_bio *cb);
static inline int compressed_bio_size(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 191e5f4e3523..2035b8eb1290 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -173,6 +173,7 @@ extern const struct btrfs_compress_op btrfs_lzo_compress;
extern const struct btrfs_compress_op btrfs_zstd_compress;
const char* btrfs_compress_type2str(enum btrfs_compression_type type);
+bool btrfs_compress_is_valid_type(const char *str, size_t len);
int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index a9e2e66152ee..af109c0ba720 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -257,11 +257,7 @@ static int prop_compression_validate(const char *value, size_t len)
if (!value)
return 0;
- if (!strncmp("lzo", value, 3))
- return 0;
- else if (!strncmp("zlib", value, 4))
- return 0;
- else if (!strncmp("zstd", value, 4))
+ if (btrfs_compress_is_valid_type(value, len))
return 0;
return -EINVAL;
From: Yingying Tang <yintang(a)codeaurora.org>
[ Upstream commit 9e7251fa38978b85108c44743e1436d48e8d0d76 ]
tx_stats will be freed and set to NULL before debugfs_sta node is
removed in station disconnetion process. So if read the debugfs_sta
node there may be NULL pointer error. Add check for tx_stats before
use it to resove this issue.
Signed-off-by: Yingying Tang <yintang(a)codeaurora.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/debugfs_sta.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/debugfs_sta.c b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
index c704ae371c4d..42931a669b02 100644
--- a/drivers/net/wireless/ath/ath10k/debugfs_sta.c
+++ b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
@@ -663,6 +663,13 @@ static ssize_t ath10k_dbg_sta_dump_tx_stats(struct file *file,
mutex_lock(&ar->conf_mutex);
+ if (!arsta->tx_stats) {
+ ath10k_warn(ar, "failed to get tx stats");
+ mutex_unlock(&ar->conf_mutex);
+ kfree(buf);
+ return 0;
+ }
+
spin_lock_bh(&ar->data_lock);
for (k = 0; k < ATH10K_STATS_TYPE_MAX; k++) {
for (j = 0; j < ATH10K_COUNTER_TYPE_MAX; j++) {
--
2.20.1
"Oyez Oyez..." its time for a stable update of fixes for XFS. 4 out of the
9 fixes here were recommended by Amir, and tested by both Amir and Sasha.
I've found a few other fixes, and have tested all these changes with
fstests against the following configurations in fstests sections as per
oscheck [0] and found no regressions in comparsin to v4.19.58 and by
running the full set of tests 3 times completely:
* xfs
* xfs_nocrc
* xfs_nocrc_512
* xfs_reflink
* xfs_reflink_1024
* xfs_logdev
* xfs_realtimedev
Known issues are listed on the expunges files, but its no different than
the current baseline.
Worth noting is a now known generic/388 crash on xfs_nocrc, xfs_reflink,
and what may be a new section we should consider to track:
"xfs_reflink_normapbt" with the following resulting filesystem:
# xfs_info /dev/loop5
meta-data=/dev/loop5 isize=512 agcount=4, agsize=1310720 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=5242880, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Do we want to create a baseline and track this configuration for stable
as well?
There is a stable bug tracking this, kz#204223 [1], and a respective bug
also present on upstream via kz#204049 [2] which Zorro reported. But,
again, nothing changes from the baseline.
I'd appreciate further reviews from the patches.
I have some other fixes in mind as well, but I'd rather not delay this
set and think this is a first good batch.
This also goes out as the first set of stable fixes using oscheck's
new devops infrastructure built on ansible / vagrant / terraform [3].
For this release I've used vagrant with KVM, perhaps the next one
I'll try terraform on whatever cloud solution someone is willing
to let me use.
You can also find these changes on my 20190718-linux-xfs-4.19.y-v1
branch on kernel.org [4].
Lemme know if you see any issues or have any questions.
[0] https://gitlab.com/mcgrof/oscheck/blob/master/fstests-configs/xfs.config
[1] https://bugzilla.kernel.org/show_bug.cgi?id=204223
[2] https://bugzilla.kernel.org/show_bug.cgi?id=204049
[3] https://gitlab.com/mcgrof/kdevops
[4] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-stable.git/log…
Brian Foster (1):
xfs: serialize unaligned dio writes against all other dio writes
Darrick J. Wong (6):
xfs: fix pagecache truncation prior to reflink
xfs: don't overflow xattr listent buffer
xfs: rename m_inotbt_nores to m_finobt_nores
xfs: don't ever put nlink > 0 inodes on the unlinked list
xfs: reserve blocks for ifree transaction during log recovery
xfs: abort unaligned nowait directio early
Dave Chinner (1):
xfs: flush removing page cache in xfs_reflink_remap_prep
Luis R. Rodriguez (1):
xfs: fix reporting supported extra file attributes for statx()
fs/xfs/libxfs/xfs_ag_resv.c | 2 +-
fs/xfs/libxfs/xfs_ialloc_btree.c | 4 ++--
fs/xfs/xfs_attr_list.c | 1 +
fs/xfs/xfs_bmap_util.c | 2 +-
fs/xfs/xfs_bmap_util.h | 2 ++
fs/xfs/xfs_file.c | 27 +++++++++++++++++----------
fs/xfs/xfs_fsops.c | 1 +
fs/xfs/xfs_inode.c | 18 +++++++-----------
fs/xfs/xfs_iops.c | 21 +++++++++++++++++++--
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_reflink.c | 16 +++++++++++++---
fs/xfs/xfs_super.c | 7 +++++++
fs/xfs/xfs_xattr.c | 3 +++
13 files changed, 75 insertions(+), 31 deletions(-)
--
2.20.1
The patch titled
Subject: cgroup: kselftest: relax fs_spec checks
has been added to the -mm tree. Its filename is
cgroup-kselftest-relax-fs_spec-checks.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/cgroup-kselftest-relax-fs_spec-che…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/cgroup-kselftest-relax-fs_spec-che…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Chris Down <chris(a)chrisdown.name>
Subject: cgroup: kselftest: relax fs_spec checks
On my laptop most memcg kselftests were being skipped because it claimed
cgroup v2 hierarchy wasn't mounted, but this isn't correct. Instead, it
seems current systemd HEAD mounts it with the name "cgroup2" instead of
"cgroup":
% grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
I can't think of a reason to need to check fs_spec explicitly
since it's arbitrary, so we can just rely on fs_vfstype.
After these changes, `make TARGETS=cgroup kselftest` actually runs the
cgroup v2 tests in more cases.
Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.name
Signed-off-by: Chris Down <chris(a)chrisdown.name>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/cgroup/cgroup_util.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/tools/testing/selftests/cgroup/cgroup_util.c~cgroup-kselftest-relax-fs_spec-checks
+++ a/tools/testing/selftests/cgroup/cgroup_util.c
@@ -191,8 +191,7 @@ int cg_find_unified_root(char *root, siz
strtok(NULL, delim);
strtok(NULL, delim);
- if (strcmp(fs, "cgroup") == 0 &&
- strcmp(type, "cgroup2") == 0) {
+ if (strcmp(type, "cgroup2") == 0) {
strncpy(root, mount, len);
return 0;
}
_
Patches currently in -mm which might be from chris(a)chrisdown.name are
cgroup-kselftest-relax-fs_spec-checks.patch
mm-throttle-allocators-when-failing-reclaim-over-memoryhigh.patch
mm-proportional-memorylowmin-reclaim.patch
mm-make-memoryemin-the-baseline-for-utilisation-determination.patch
mm-make-memoryemin-the-baseline-for-utilisation-determination-fix.patch
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
commit ed527b13d800dd515a9e6c582f0a73eca65b2e1b upstream.
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
[ Horia: backported to 4.14, 4.19 ]
Signed-off-by: Horia Geantă <horia.geanta(a)nxp.com>
---
drivers/crypto/caam/caamalg.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 9bc54c3c2cb9..1907945f82b7 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -887,6 +887,7 @@ static void ablkcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct ablkcipher_request *req = context;
struct ablkcipher_edesc *edesc;
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
int ivsize = crypto_ablkcipher_ivsize(ablkcipher);
#ifdef DEBUG
@@ -911,10 +912,11 @@ static void ablkcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->info) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- scatterwalk_map_and_copy(req->info, req->dst, req->nbytes - ivsize,
- ivsize, 0);
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
+ scatterwalk_map_and_copy(req->info, req->dst, req->nbytes -
+ ivsize, ivsize, 0);
/* In case initial IV was generated, copy it in GIVCIPHER request */
if (edesc->iv_dir == DMA_FROM_DEVICE) {
@@ -1651,10 +1653,11 @@ static int ablkcipher_decrypt(struct ablkcipher_request *req)
/*
* The crypto API expects us to set the IV (req->info) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- scatterwalk_map_and_copy(req->info, req->src, req->nbytes - ivsize,
- ivsize, 0);
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
+ scatterwalk_map_and_copy(req->info, req->src, req->nbytes -
+ ivsize, ivsize, 0);
/* Create and submit job descriptor*/
init_ablkcipher_job(ctx->sh_desc_dec, ctx->sh_desc_dec_dma, edesc, req);
--
2.17.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 179006688a7e888cbff39577189f2e034786d06a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 19 Jun 2019 13:05:50 +0100
Subject: [PATCH] Btrfs: add missing inode version, ctime and mtime updates
when punching hole
If the range for which we are punching a hole covers only part of a page,
we end up updating the inode item but we skip the update of the inode's
iversion, mtime and ctime. Fix that by ensuring we update those properties
of the inode.
A patch for fstests test case generic/059 that tests this as been sent
along with this fix.
Fixes: 2aaa66558172b0 ("Btrfs: add hole punching")
Fixes: e8c1c76e804b18 ("Btrfs: add missing inode update when punching hole")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5370152ea7e3..b455bdf46faa 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2711,6 +2711,11 @@ out_only_mutex:
* for detecting, at fsync time, if the inode isn't yet in the
* log tree or it's there but not up to date.
*/
+ struct timespec64 now = current_time(inode);
+
+ inode_inc_iversion(inode);
+ inode->i_mtime = now;
+ inode->i_ctime = now;
trans = btrfs_start_transaction(root, 1);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49f17c26c123b60fd1c74629eef077740d16ffc2 Mon Sep 17 00:00:00 2001
From: Nadav Amit <namit(a)vmware.com>
Date: Thu, 18 Jul 2019 15:57:31 -0700
Subject: [PATCH] resource: fix locking in find_next_iomem_res()
Since resources can be removed, locking should ensure that the resource
is not removed while accessing it. However, find_next_iomem_res() does
not hold the lock while copying the data of the resource.
Keep holding the lock while the data is copied. While at it, change the
return value to a more informative value. It is disregarded by the
callers.
[akpm(a)linux-foundation.org: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/kernel/resource.c b/kernel/resource.c
index d22423e85cf8..3ced0cd45bdd 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -326,7 +326,7 @@ EXPORT_SYMBOL(release_resource);
*
* If a resource is found, returns 0 and @*res is overwritten with the part
* of the resource that's within [@start..@end]; if none is found, returns
- * -1 or -EINVAL for other invalid parameters.
+ * -ENODEV. Returns -EINVAL for invalid parameters.
*
* This function walks the whole tree and not just first level children
* unless @first_lvl is true.
@@ -365,16 +365,16 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
break;
}
+ if (p) {
+ /* copy data */
+ res->start = max(start, p->start);
+ res->end = min(end, p->end);
+ res->flags = p->flags;
+ res->desc = p->desc;
+ }
+
read_unlock(&resource_lock);
- if (!p)
- return -1;
-
- /* copy data */
- res->start = max(start, p->start);
- res->end = min(end, p->end);
- res->flags = p->flags;
- res->desc = p->desc;
- return 0;
+ return p ? 0 : -ENODEV;
}
static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49f17c26c123b60fd1c74629eef077740d16ffc2 Mon Sep 17 00:00:00 2001
From: Nadav Amit <namit(a)vmware.com>
Date: Thu, 18 Jul 2019 15:57:31 -0700
Subject: [PATCH] resource: fix locking in find_next_iomem_res()
Since resources can be removed, locking should ensure that the resource
is not removed while accessing it. However, find_next_iomem_res() does
not hold the lock while copying the data of the resource.
Keep holding the lock while the data is copied. While at it, change the
return value to a more informative value. It is disregarded by the
callers.
[akpm(a)linux-foundation.org: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/kernel/resource.c b/kernel/resource.c
index d22423e85cf8..3ced0cd45bdd 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -326,7 +326,7 @@ EXPORT_SYMBOL(release_resource);
*
* If a resource is found, returns 0 and @*res is overwritten with the part
* of the resource that's within [@start..@end]; if none is found, returns
- * -1 or -EINVAL for other invalid parameters.
+ * -ENODEV. Returns -EINVAL for invalid parameters.
*
* This function walks the whole tree and not just first level children
* unless @first_lvl is true.
@@ -365,16 +365,16 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
break;
}
+ if (p) {
+ /* copy data */
+ res->start = max(start, p->start);
+ res->end = min(end, p->end);
+ res->flags = p->flags;
+ res->desc = p->desc;
+ }
+
read_unlock(&resource_lock);
- if (!p)
- return -1;
-
- /* copy data */
- res->start = max(start, p->start);
- res->end = min(end, p->end);
- res->flags = p->flags;
- res->desc = p->desc;
- return 0;
+ return p ? 0 : -ENODEV;
}
static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49f17c26c123b60fd1c74629eef077740d16ffc2 Mon Sep 17 00:00:00 2001
From: Nadav Amit <namit(a)vmware.com>
Date: Thu, 18 Jul 2019 15:57:31 -0700
Subject: [PATCH] resource: fix locking in find_next_iomem_res()
Since resources can be removed, locking should ensure that the resource
is not removed while accessing it. However, find_next_iomem_res() does
not hold the lock while copying the data of the resource.
Keep holding the lock while the data is copied. While at it, change the
return value to a more informative value. It is disregarded by the
callers.
[akpm(a)linux-foundation.org: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/kernel/resource.c b/kernel/resource.c
index d22423e85cf8..3ced0cd45bdd 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -326,7 +326,7 @@ EXPORT_SYMBOL(release_resource);
*
* If a resource is found, returns 0 and @*res is overwritten with the part
* of the resource that's within [@start..@end]; if none is found, returns
- * -1 or -EINVAL for other invalid parameters.
+ * -ENODEV. Returns -EINVAL for invalid parameters.
*
* This function walks the whole tree and not just first level children
* unless @first_lvl is true.
@@ -365,16 +365,16 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
break;
}
+ if (p) {
+ /* copy data */
+ res->start = max(start, p->start);
+ res->end = min(end, p->end);
+ res->flags = p->flags;
+ res->desc = p->desc;
+ }
+
read_unlock(&resource_lock);
- if (!p)
- return -1;
-
- /* copy data */
- res->start = max(start, p->start);
- res->end = min(end, p->end);
- res->flags = p->flags;
- res->desc = p->desc;
- return 0;
+ return p ? 0 : -ENODEV;
}
static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 89705e92700170888236555fe91b45e4c1bb0985 Mon Sep 17 00:00:00 2001
From: Danit Goldberg <danitg(a)mellanox.com>
Date: Fri, 5 Jul 2019 19:21:57 +0300
Subject: [PATCH] IB/mlx5: Report correctly tag matching rendezvous capability
Userspace expects the IB_TM_CAP_RC bit to indicate that the device
supports RC transport tag matching with rendezvous offload. However the
firmware splits this into two capabilities for eager and rendezvous tag
matching.
Only if the FW supports both modes should userspace be told the tag
matching capability is available.
Cc: <stable(a)vger.kernel.org> # 4.13
Fixes: eb761894351d ("IB/mlx5: Fill XRQ capabilities")
Signed-off-by: Danit Goldberg <danitg(a)mellanox.com>
Reviewed-by: Yishai Hadas <yishaih(a)mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 7581571bd9cd..56d4b1e9dd23 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1046,15 +1046,19 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
}
if (MLX5_CAP_GEN(mdev, tag_matching)) {
- props->tm_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
props->tm_caps.max_num_tags =
(1 << MLX5_CAP_GEN(mdev, log_tag_matching_list_sz)) - 1;
- props->tm_caps.flags = IB_TM_CAP_RC;
props->tm_caps.max_ops =
1 << MLX5_CAP_GEN(mdev, log_max_qp_sz);
props->tm_caps.max_sge = MLX5_TM_MAX_SGE;
}
+ if (MLX5_CAP_GEN(mdev, tag_matching) &&
+ MLX5_CAP_GEN(mdev, rndv_offload_rc)) {
+ props->tm_caps.flags = IB_TM_CAP_RNDV_RC;
+ props->tm_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
+ }
+
if (MLX5_CAP_GEN(dev->mdev, cq_moderation)) {
props->cq_caps.max_cq_moderation_count =
MLX5_MAX_CQ_COUNT;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 50806bef9f20..4053be51b7fa 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -307,8 +307,8 @@ struct ib_rss_caps {
};
enum ib_tm_cap_flags {
- /* Support tag matching on RC transport */
- IB_TM_CAP_RC = 1 << 0,
+ /* Support tag matching with rendezvous offload for RC transport */
+ IB_TM_CAP_RNDV_RC = 1 << 0,
};
struct ib_tm_caps {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 803f0f64d17769071d7287d9e3e3b79a3e1ae937 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 19 Jun 2019 13:05:39 +0100
Subject: [PATCH] Btrfs: fix fsync not persisting dentry deletions due to inode
evictions
In order to avoid searches on a log tree when unlinking an inode, we check
if the inode being unlinked was logged in the current transaction, as well
as the inode of its parent directory. When any of the inodes are logged,
we proceed to delete directory items and inode reference items from the
log, to ensure that if a subsequent fsync of only the inode being unlinked
or only of the parent directory when the other is not fsync'ed as well,
does not result in the entry still existing after a power failure.
That check however is not reliable when one of the inodes involved (the
one being unlinked or its parent directory's inode) is evicted, since the
logged_trans field is transient, that is, it is not stored on disk, so it
is lost when the inode is evicted and loaded into memory again (which is
set to zero on load). As a consequence the checks currently being done by
btrfs_del_dir_entries_in_log() and btrfs_del_inode_ref_in_log() always
return true if the inode was evicted before, regardless of the inode
having been logged or not before (and in the current transaction), this
results in the dentry being unlinked still existing after a log replay
if after the unlink operation only one of the inodes involved is fsync'ed.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ xfs_io -c fsync /mnt/dir/foo
# Keep an open file descriptor on our directory while we evict inodes.
# We just want to evict the file's inode, the directory's inode must not
# be evicted.
$ ( cd /mnt/dir; while true; do :; done ) &
$ pid=$!
# Wait a bit to give time to background process to chdir to our test
# directory.
$ sleep 0.5
# Trigger eviction of the file's inode.
$ echo 2 > /proc/sys/vm/drop_caches
# Unlink our file and fsync the parent directory. After a power failure
# we don't expect to see the file anymore, since we fsync'ed the parent
# directory.
$ rm -f $SCRATCH_MNT/dir/foo
$ xfs_io -c fsync /mnt/dir
<power failure>
$ mount /dev/sdb /mnt
$ ls /mnt/dir
foo
$
--> file still there, unlink not persisted despite explicit fsync on dir
Fix this by checking if the inode has the full_sync bit set in its runtime
flags as well, since that bit is set everytime an inode is loaded from
disk, or for other less common cases such as after a shrinking truncate
or failure to allocate extent maps for holes, and gets cleared after the
first fsync. Also consider the inode as possibly logged only if it was
last modified in the current transaction (besides having the full_fsync
flag set).
Fixes: 3a5f1d458ad161 ("Btrfs: Optimize btree walking while logging inodes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 4a04659fded7..6c8297bcfeb7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3322,6 +3322,30 @@ int btrfs_free_log_root_tree(struct btrfs_trans_handle *trans,
return 0;
}
+/*
+ * Check if an inode was logged in the current transaction. We can't always rely
+ * on an inode's logged_trans value, because it's an in-memory only field and
+ * therefore not persisted. This means that its value is lost if the inode gets
+ * evicted and loaded again from disk (in which case it has a value of 0, and
+ * certainly it is smaller then any possible transaction ID), when that happens
+ * the full_sync flag is set in the inode's runtime flags, so on that case we
+ * assume eviction happened and ignore the logged_trans value, assuming the
+ * worst case, that the inode was logged before in the current transaction.
+ */
+static bool inode_logged(struct btrfs_trans_handle *trans,
+ struct btrfs_inode *inode)
+{
+ if (inode->logged_trans == trans->transid)
+ return true;
+
+ if (inode->last_trans == trans->transid &&
+ test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) &&
+ !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags))
+ return true;
+
+ return false;
+}
+
/*
* If both a file and directory are logged, and unlinks or renames are
* mixed in, we have a few interesting corners:
@@ -3356,7 +3380,7 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
int bytes_del = 0;
u64 dir_ino = btrfs_ino(dir);
- if (dir->logged_trans < trans->transid)
+ if (!inode_logged(trans, dir))
return 0;
ret = join_running_log_trans(root);
@@ -3460,7 +3484,7 @@ int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
u64 index;
int ret;
- if (inode->logged_trans < trans->transid)
+ if (!inode_logged(trans, inode))
return 0;
ret = join_running_log_trans(root);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 803f0f64d17769071d7287d9e3e3b79a3e1ae937 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 19 Jun 2019 13:05:39 +0100
Subject: [PATCH] Btrfs: fix fsync not persisting dentry deletions due to inode
evictions
In order to avoid searches on a log tree when unlinking an inode, we check
if the inode being unlinked was logged in the current transaction, as well
as the inode of its parent directory. When any of the inodes are logged,
we proceed to delete directory items and inode reference items from the
log, to ensure that if a subsequent fsync of only the inode being unlinked
or only of the parent directory when the other is not fsync'ed as well,
does not result in the entry still existing after a power failure.
That check however is not reliable when one of the inodes involved (the
one being unlinked or its parent directory's inode) is evicted, since the
logged_trans field is transient, that is, it is not stored on disk, so it
is lost when the inode is evicted and loaded into memory again (which is
set to zero on load). As a consequence the checks currently being done by
btrfs_del_dir_entries_in_log() and btrfs_del_inode_ref_in_log() always
return true if the inode was evicted before, regardless of the inode
having been logged or not before (and in the current transaction), this
results in the dentry being unlinked still existing after a log replay
if after the unlink operation only one of the inodes involved is fsync'ed.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ xfs_io -c fsync /mnt/dir/foo
# Keep an open file descriptor on our directory while we evict inodes.
# We just want to evict the file's inode, the directory's inode must not
# be evicted.
$ ( cd /mnt/dir; while true; do :; done ) &
$ pid=$!
# Wait a bit to give time to background process to chdir to our test
# directory.
$ sleep 0.5
# Trigger eviction of the file's inode.
$ echo 2 > /proc/sys/vm/drop_caches
# Unlink our file and fsync the parent directory. After a power failure
# we don't expect to see the file anymore, since we fsync'ed the parent
# directory.
$ rm -f $SCRATCH_MNT/dir/foo
$ xfs_io -c fsync /mnt/dir
<power failure>
$ mount /dev/sdb /mnt
$ ls /mnt/dir
foo
$
--> file still there, unlink not persisted despite explicit fsync on dir
Fix this by checking if the inode has the full_sync bit set in its runtime
flags as well, since that bit is set everytime an inode is loaded from
disk, or for other less common cases such as after a shrinking truncate
or failure to allocate extent maps for holes, and gets cleared after the
first fsync. Also consider the inode as possibly logged only if it was
last modified in the current transaction (besides having the full_fsync
flag set).
Fixes: 3a5f1d458ad161 ("Btrfs: Optimize btree walking while logging inodes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 4a04659fded7..6c8297bcfeb7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3322,6 +3322,30 @@ int btrfs_free_log_root_tree(struct btrfs_trans_handle *trans,
return 0;
}
+/*
+ * Check if an inode was logged in the current transaction. We can't always rely
+ * on an inode's logged_trans value, because it's an in-memory only field and
+ * therefore not persisted. This means that its value is lost if the inode gets
+ * evicted and loaded again from disk (in which case it has a value of 0, and
+ * certainly it is smaller then any possible transaction ID), when that happens
+ * the full_sync flag is set in the inode's runtime flags, so on that case we
+ * assume eviction happened and ignore the logged_trans value, assuming the
+ * worst case, that the inode was logged before in the current transaction.
+ */
+static bool inode_logged(struct btrfs_trans_handle *trans,
+ struct btrfs_inode *inode)
+{
+ if (inode->logged_trans == trans->transid)
+ return true;
+
+ if (inode->last_trans == trans->transid &&
+ test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) &&
+ !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags))
+ return true;
+
+ return false;
+}
+
/*
* If both a file and directory are logged, and unlinks or renames are
* mixed in, we have a few interesting corners:
@@ -3356,7 +3380,7 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
int bytes_del = 0;
u64 dir_ino = btrfs_ino(dir);
- if (dir->logged_trans < trans->transid)
+ if (!inode_logged(trans, dir))
return 0;
ret = join_running_log_trans(root);
@@ -3460,7 +3484,7 @@ int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
u64 index;
int ret;
- if (inode->logged_trans < trans->transid)
+ if (!inode_logged(trans, inode))
return 0;
ret = join_running_log_trans(root);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d1d832a0b51dd9570429bb4b81b2a6c1759e681a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 7 Jun 2019 11:25:24 +0100
Subject: [PATCH] Btrfs: fix data loss after inode eviction, renaming it, and
fsync it
When we log an inode, regardless of logging it completely or only that it
exists, we always update it as logged (logged_trans and last_log_commit
fields of the inode are updated). This is generally fine and avoids future
attempts to log it from having to do repeated work that brings no value.
However, if we write data to a file, then evict its inode after all the
dealloc was flushed (and ordered extents completed), rename the file and
fsync it, we end up not logging the new extents, since the rename may
result in logging that the inode exists in case the parent directory was
logged before. The following reproducer shows and explains how this can
happen:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ touch /mnt/dir/bar
# Do a direct IO write instead of a buffered write because with a
# buffered write we would need to make sure dealloc gets flushed and
# complete before we do the inode eviction later, and we can not do that
# from user space with call to things such as sync(2) since that results
# in a transaction commit as well.
$ xfs_io -d -c "pwrite -S 0xd3 0 4K" /mnt/dir/bar
# Keep the directory dir in use while we evict inodes. We want our file
# bar's inode to be evicted but we don't want our directory's inode to
# be evicted (if it were evicted too, we would not be able to reproduce
# the issue since the first fsync below, of file foo, would result in a
# transaction commit.
$ ( cd /mnt/dir; while true; do :; done ) &
$ pid=$!
# Wait a bit to give time for the background process to chdir.
$ sleep 0.1
# Evict all inodes, except the inode for the directory dir because it is
# currently in use by our background process.
$ echo 2 > /proc/sys/vm/drop_caches
# fsync file foo, which ends up persisting information about the parent
# directory because it is a new inode.
$ xfs_io -c fsync /mnt/dir/foo
# Rename bar, this results in logging that this inode exists (inode item,
# names, xattrs) because the parent directory is in the log.
$ mv /mnt/dir/bar /mnt/dir/baz
# Now fsync baz, which ends up doing absolutely nothing because of the
# rename operation which logged that the inode exists only.
$ xfs_io -c fsync /mnt/dir/baz
<power failure>
$ mount /dev/sdb /mnt
$ od -t x1 -A d /mnt/dir/baz
0000000
--> Empty file, data we wrote is missing.
Fix this by not updating last_sub_trans of an inode when we are logging
only that it exists and the inode was not yet logged since it was loaded
from disk (full_sync bit set), this is enough to make btrfs_inode_in_log()
return false for this scenario and make us log the inode. The logged_trans
of the inode is still always setsince that alone is used to track if names
need to be deleted as part of unlink operations.
Fixes: 257c62e1bce03e ("Btrfs: avoid tree log commit when there are no changes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3fc8d854d7fb..4a04659fded7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5420,9 +5420,19 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
}
}
+ /*
+ * Don't update last_log_commit if we logged that an inode exists after
+ * it was loaded to memory (full_sync bit set).
+ * This is to prevent data loss when we do a write to the inode, then
+ * the inode gets evicted after all delalloc was flushed, then we log
+ * it exists (due to a rename for example) and then fsync it. This last
+ * fsync would do nothing (not logging the extents previously written).
+ */
spin_lock(&inode->lock);
inode->logged_trans = trans->transid;
- inode->last_log_commit = inode->last_sub_trans;
+ if (inode_only != LOG_INODE_EXISTS ||
+ !test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags))
+ inode->last_log_commit = inode->last_sub_trans;
spin_unlock(&inode->lock);
out_unlock:
mutex_unlock(&inode->log_mutex);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d1d832a0b51dd9570429bb4b81b2a6c1759e681a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 7 Jun 2019 11:25:24 +0100
Subject: [PATCH] Btrfs: fix data loss after inode eviction, renaming it, and
fsync it
When we log an inode, regardless of logging it completely or only that it
exists, we always update it as logged (logged_trans and last_log_commit
fields of the inode are updated). This is generally fine and avoids future
attempts to log it from having to do repeated work that brings no value.
However, if we write data to a file, then evict its inode after all the
dealloc was flushed (and ordered extents completed), rename the file and
fsync it, we end up not logging the new extents, since the rename may
result in logging that the inode exists in case the parent directory was
logged before. The following reproducer shows and explains how this can
happen:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/foo
$ touch /mnt/dir/bar
# Do a direct IO write instead of a buffered write because with a
# buffered write we would need to make sure dealloc gets flushed and
# complete before we do the inode eviction later, and we can not do that
# from user space with call to things such as sync(2) since that results
# in a transaction commit as well.
$ xfs_io -d -c "pwrite -S 0xd3 0 4K" /mnt/dir/bar
# Keep the directory dir in use while we evict inodes. We want our file
# bar's inode to be evicted but we don't want our directory's inode to
# be evicted (if it were evicted too, we would not be able to reproduce
# the issue since the first fsync below, of file foo, would result in a
# transaction commit.
$ ( cd /mnt/dir; while true; do :; done ) &
$ pid=$!
# Wait a bit to give time for the background process to chdir.
$ sleep 0.1
# Evict all inodes, except the inode for the directory dir because it is
# currently in use by our background process.
$ echo 2 > /proc/sys/vm/drop_caches
# fsync file foo, which ends up persisting information about the parent
# directory because it is a new inode.
$ xfs_io -c fsync /mnt/dir/foo
# Rename bar, this results in logging that this inode exists (inode item,
# names, xattrs) because the parent directory is in the log.
$ mv /mnt/dir/bar /mnt/dir/baz
# Now fsync baz, which ends up doing absolutely nothing because of the
# rename operation which logged that the inode exists only.
$ xfs_io -c fsync /mnt/dir/baz
<power failure>
$ mount /dev/sdb /mnt
$ od -t x1 -A d /mnt/dir/baz
0000000
--> Empty file, data we wrote is missing.
Fix this by not updating last_sub_trans of an inode when we are logging
only that it exists and the inode was not yet logged since it was loaded
from disk (full_sync bit set), this is enough to make btrfs_inode_in_log()
return false for this scenario and make us log the inode. The logged_trans
of the inode is still always setsince that alone is used to track if names
need to be deleted as part of unlink operations.
Fixes: 257c62e1bce03e ("Btrfs: avoid tree log commit when there are no changes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3fc8d854d7fb..4a04659fded7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5420,9 +5420,19 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
}
}
+ /*
+ * Don't update last_log_commit if we logged that an inode exists after
+ * it was loaded to memory (full_sync bit set).
+ * This is to prevent data loss when we do a write to the inode, then
+ * the inode gets evicted after all delalloc was flushed, then we log
+ * it exists (due to a rename for example) and then fsync it. This last
+ * fsync would do nothing (not logging the extents previously written).
+ */
spin_lock(&inode->lock);
inode->logged_trans = trans->transid;
- inode->last_log_commit = inode->last_sub_trans;
+ if (inode_only != LOG_INODE_EXISTS ||
+ !test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags))
+ inode->last_log_commit = inode->last_sub_trans;
spin_unlock(&inode->lock);
out_unlock:
mutex_unlock(&inode->log_mutex);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 64adde31c8e996a6db6f7a1a4131180e363aa9f2 Mon Sep 17 00:00:00 2001
From: Niklas Cassel <niklas.cassel(a)linaro.org>
Date: Wed, 29 May 2019 11:43:52 +0200
Subject: [PATCH] PCI: qcom: Ensure that PERST is asserted for at least 100 ms
Currently, there is only a 1 ms sleep after asserting PERST.
Reading the datasheets for different endpoints, some require PERST to be
asserted for 10 ms in order for the endpoint to perform a reset, others
require it to be asserted for 50 ms.
Several SoCs using this driver uses PCIe Mini Card, where we don't know
what endpoint will be plugged in.
The PCI Express Card Electromechanical Specification r2.0, section
2.2, "PERST# Signal" specifies:
"On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL) from
the power rails achieving specified operating limits."
Add a sleep of 100 ms before deasserting PERST, in order to ensure that
we are compliant with the spec.
Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Niklas Cassel <niklas.cassel(a)linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Acked-by: Stanimir Varbanov <svarbanov(a)mm-sol.com>
Cc: stable(a)vger.kernel.org # 4.5+
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index da5dd3639a49..7e581748ee9f 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -178,6 +178,8 @@ static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
static void qcom_ep_reset_deassert(struct qcom_pcie *pcie)
{
+ /* Ensure that PERST has been asserted for at least 100 ms */
+ msleep(100);
gpiod_set_value_cansleep(pcie->reset, 0);
usleep_range(PERST_DELAY_US, PERST_DELAY_US + 500);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 64adde31c8e996a6db6f7a1a4131180e363aa9f2 Mon Sep 17 00:00:00 2001
From: Niklas Cassel <niklas.cassel(a)linaro.org>
Date: Wed, 29 May 2019 11:43:52 +0200
Subject: [PATCH] PCI: qcom: Ensure that PERST is asserted for at least 100 ms
Currently, there is only a 1 ms sleep after asserting PERST.
Reading the datasheets for different endpoints, some require PERST to be
asserted for 10 ms in order for the endpoint to perform a reset, others
require it to be asserted for 50 ms.
Several SoCs using this driver uses PCIe Mini Card, where we don't know
what endpoint will be plugged in.
The PCI Express Card Electromechanical Specification r2.0, section
2.2, "PERST# Signal" specifies:
"On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL) from
the power rails achieving specified operating limits."
Add a sleep of 100 ms before deasserting PERST, in order to ensure that
we are compliant with the spec.
Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Niklas Cassel <niklas.cassel(a)linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Acked-by: Stanimir Varbanov <svarbanov(a)mm-sol.com>
Cc: stable(a)vger.kernel.org # 4.5+
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index da5dd3639a49..7e581748ee9f 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -178,6 +178,8 @@ static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
static void qcom_ep_reset_deassert(struct qcom_pcie *pcie)
{
+ /* Ensure that PERST has been asserted for at least 100 ms */
+ msleep(100);
gpiod_set_value_cansleep(pcie->reset, 0);
usleep_range(PERST_DELAY_US, PERST_DELAY_US + 500);
}
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7608bf40cf2480057ec0da31456cc428791c32ef Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg(a)mellanox.com>
Date: Tue, 11 Jun 2019 13:09:51 -0300
Subject: [PATCH] RDMA/odp: Fix missed unlock in non-blocking invalidate_start
If invalidate_start returns with EAGAIN then the umem_rwsem needs to be
unlocked as no invalidate_end will be called.
Cc: <stable(a)vger.kernel.org>
Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count")
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Reviewed-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 9001cc10770a..eb9939d52818 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -149,6 +149,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
{
struct ib_ucontext_per_mm *per_mm =
container_of(mn, struct ib_ucontext_per_mm, mn);
+ int rc;
if (mmu_notifier_range_blockable(range))
down_read(&per_mm->umem_rwsem);
@@ -165,11 +166,14 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
return 0;
}
- return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
- range->end,
- invalidate_range_start_trampoline,
- mmu_notifier_range_blockable(range),
- NULL);
+ rc = rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
+ range->end,
+ invalidate_range_start_trampoline,
+ mmu_notifier_range_blockable(range),
+ NULL);
+ if (rc)
+ up_read(&per_mm->umem_rwsem);
+ return rc;
}
static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7608bf40cf2480057ec0da31456cc428791c32ef Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg(a)mellanox.com>
Date: Tue, 11 Jun 2019 13:09:51 -0300
Subject: [PATCH] RDMA/odp: Fix missed unlock in non-blocking invalidate_start
If invalidate_start returns with EAGAIN then the umem_rwsem needs to be
unlocked as no invalidate_end will be called.
Cc: <stable(a)vger.kernel.org>
Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count")
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Reviewed-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 9001cc10770a..eb9939d52818 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -149,6 +149,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
{
struct ib_ucontext_per_mm *per_mm =
container_of(mn, struct ib_ucontext_per_mm, mn);
+ int rc;
if (mmu_notifier_range_blockable(range))
down_read(&per_mm->umem_rwsem);
@@ -165,11 +166,14 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
return 0;
}
- return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
- range->end,
- invalidate_range_start_trampoline,
- mmu_notifier_range_blockable(range),
- NULL);
+ rc = rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
+ range->end,
+ invalidate_range_start_trampoline,
+ mmu_notifier_range_blockable(range),
+ NULL);
+ if (rc)
+ up_read(&per_mm->umem_rwsem);
+ return rc;
}
static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bcef5b7215681250c4bf8961dfe15e9e4fef97d0 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bvanassche(a)acm.org>
Date: Wed, 29 May 2019 09:38:31 -0700
Subject: [PATCH] RDMA/srp: Accept again source addresses that do not have a
port number
The function srp_parse_in() is used both for parsing source address
specifications and for target address specifications. Target addresses
must have a port number. Having to specify a port number for source
addresses is inconvenient. Make sure that srp_parse_in() supports again
parsing addresses with no port number.
Cc: <stable(a)vger.kernel.org>
Fixes: c62adb7def71 ("IB/srp: Fix IPv6 address parsing")
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index be9ddcad8f28..87848faa7502 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -3481,13 +3481,14 @@ static const match_table_t srp_opt_tokens = {
* @net: [in] Network namespace.
* @sa: [out] Address family, IP address and port number.
* @addr_port_str: [in] IP address and port number.
+ * @has_port: [out] Whether or not @addr_port_str includes a port number.
*
* Parse the following address formats:
* - IPv4: <ip_address>:<port>, e.g. 1.2.3.4:5.
* - IPv6: \[<ipv6_address>\]:<port>, e.g. [1::2:3%4]:5.
*/
static int srp_parse_in(struct net *net, struct sockaddr_storage *sa,
- const char *addr_port_str)
+ const char *addr_port_str, bool *has_port)
{
char *addr_end, *addr = kstrdup(addr_port_str, GFP_KERNEL);
char *port_str;
@@ -3496,9 +3497,12 @@ static int srp_parse_in(struct net *net, struct sockaddr_storage *sa,
if (!addr)
return -ENOMEM;
port_str = strrchr(addr, ':');
- if (!port_str)
- return -EINVAL;
- *port_str++ = '\0';
+ if (port_str && strchr(port_str, ']'))
+ port_str = NULL;
+ if (port_str)
+ *port_str++ = '\0';
+ if (has_port)
+ *has_port = port_str != NULL;
ret = inet_pton_with_scope(net, AF_INET, addr, port_str, sa);
if (ret && addr[0]) {
addr_end = addr + strlen(addr) - 1;
@@ -3520,6 +3524,7 @@ static int srp_parse_options(struct net *net, const char *buf,
char *p;
substring_t args[MAX_OPT_ARGS];
unsigned long long ull;
+ bool has_port;
int opt_mask = 0;
int token;
int ret = -EINVAL;
@@ -3618,7 +3623,8 @@ static int srp_parse_options(struct net *net, const char *buf,
ret = -ENOMEM;
goto out;
}
- ret = srp_parse_in(net, &target->rdma_cm.src.ss, p);
+ ret = srp_parse_in(net, &target->rdma_cm.src.ss, p,
+ NULL);
if (ret < 0) {
pr_warn("bad source parameter '%s'\n", p);
kfree(p);
@@ -3634,7 +3640,10 @@ static int srp_parse_options(struct net *net, const char *buf,
ret = -ENOMEM;
goto out;
}
- ret = srp_parse_in(net, &target->rdma_cm.dst.ss, p);
+ ret = srp_parse_in(net, &target->rdma_cm.dst.ss, p,
+ &has_port);
+ if (!has_port)
+ ret = -EINVAL;
if (ret < 0) {
pr_warn("bad dest parameter '%s'\n", p);
kfree(p);
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 26202928fafad8bda8b478edb7e62c885be623d7 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Mon, 1 Jul 2019 14:09:18 +0900
Subject: [PATCH] block: Limit zone array allocation size
Limit the size of the struct blk_zone array used in
blk_revalidate_disk_zones() to avoid memory allocation failures leading
to disk revalidation failure. Also further reduce the likelyhood of
such failures by using kvcalloc() (that is vmalloc()) instead of
allocating contiguous pages with alloc_pages().
Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
Fixes: e76239a3748c ("block: add a report_zones method")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 58ced170b424..6c503824ba3f 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -14,6 +14,8 @@
#include <linux/rbtree.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
#include <linux/sched/mm.h>
#include "blk.h"
@@ -371,22 +373,25 @@ static inline unsigned long *blk_alloc_zone_bitmap(int node,
* Allocate an array of struct blk_zone to get nr_zones zone information.
* The allocated array may be smaller than nr_zones.
*/
-static struct blk_zone *blk_alloc_zones(int node, unsigned int *nr_zones)
+static struct blk_zone *blk_alloc_zones(unsigned int *nr_zones)
{
- size_t size = *nr_zones * sizeof(struct blk_zone);
- struct page *page;
- int order;
-
- for (order = get_order(size); order >= 0; order--) {
- page = alloc_pages_node(node, GFP_NOIO | __GFP_ZERO, order);
- if (page) {
- *nr_zones = min_t(unsigned int, *nr_zones,
- (PAGE_SIZE << order) / sizeof(struct blk_zone));
- return page_address(page);
- }
+ struct blk_zone *zones;
+ size_t nrz = min(*nr_zones, BLK_ZONED_REPORT_MAX_ZONES);
+
+ /*
+ * GFP_KERNEL here is meaningless as the caller task context has
+ * the PF_MEMALLOC_NOIO flag set in blk_revalidate_disk_zones()
+ * with memalloc_noio_save().
+ */
+ zones = kvcalloc(nrz, sizeof(struct blk_zone), GFP_KERNEL);
+ if (!zones) {
+ *nr_zones = 0;
+ return NULL;
}
- return NULL;
+ *nr_zones = nrz;
+
+ return zones;
}
void blk_queue_free_zone_bitmaps(struct request_queue *q)
@@ -448,7 +453,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
/* Get zone information and initialize seq_zones_bitmap */
rep_nr_zones = nr_zones;
- zones = blk_alloc_zones(q->node, &rep_nr_zones);
+ zones = blk_alloc_zones(&rep_nr_zones);
if (!zones)
goto out;
@@ -487,8 +492,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
out:
memalloc_noio_restore(noio_flag);
- free_pages((unsigned long)zones,
- get_order(rep_nr_zones * sizeof(struct blk_zone)));
+ kvfree(zones);
kfree(seq_zones_wlock);
kfree(seq_zones_bitmap);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 05036e3e3458..1ef375dafb1c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -344,6 +344,11 @@ struct queue_limits {
#ifdef CONFIG_BLK_DEV_ZONED
+/*
+ * Maximum number of zones to report with a single report zones command.
+ */
+#define BLK_ZONED_REPORT_MAX_ZONES 8192U
+
extern unsigned int blkdev_nr_zones(struct block_device *bdev);
extern int blkdev_report_zones(struct block_device *bdev,
sector_t sector, struct blk_zone *zones,
commit 8e2442a5f86e1f77b86401fce274a7f622740bc4 upstream.
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
---
scripts/kconfig/confdata.c | 7 +++----
scripts/kconfig/expr.h | 1 +
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 91d0a5c014ac..fd99ae90a618 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -834,11 +834,12 @@ int conf_write(const char *name)
"#\n"
"# %s\n"
"#\n", str);
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1024,8 +1025,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_split_config())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 7c329e179007..43a87f8ea738 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
--
2.17.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd82d4bd21880b7c4d5f5756be435095d6ae07b5 Mon Sep 17 00:00:00 2001
From: Julien Thierry <julien.thierry(a)arm.com>
Date: Tue, 11 Jun 2019 10:38:10 +0100
Subject: [PATCH] arm64: Fix incorrect irqflag restore for priority masking
When using IRQ priority masking to disable interrupts, in order to deal
with the PSR.I state, local_irq_save() would convert the I bit into a
PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
potentially modifying the value of PMR in undesired location due to the
state of PSR.I upon flag saving [1].
In an attempt to solve this issue in a less hackish manner, introduce
a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
whether PSR.I is being used to disable interrupts, in which case it
takes precedence of the status of interrupt masking via PMR.
GIC_PRIO_PSR_I_SET is chosen such that (<pmr_value> |
GIC_PRIO_PSR_I_SET) does not mask more interrupts than <pmr_value> as
some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
requires PMR not to mask interrupts that could be signaled to the
CPU when using only PSR.I.
[1] https://www.spinics.net/lists/arm-kernel/msg716956.html
Fixes: 4a503217ce37 ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
Cc: <stable(a)vger.kernel.org> # 5.1.x-
Reported-by: Zenghui Yu <yuzenghui(a)huawei.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Wei Li <liwei391(a)huawei.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Christoffer Dall <christoffer.dall(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose(a)arm.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Julien Thierry <julien.thierry(a)arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index 14b41ddc68ba..9e991b628706 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -163,7 +163,9 @@ static inline bool gic_prio_masking_enabled(void)
static inline void gic_pmr_mask_irqs(void)
{
- BUILD_BUG_ON(GICD_INT_DEF_PRI <= GIC_PRIO_IRQOFF);
+ BUILD_BUG_ON(GICD_INT_DEF_PRI < (GIC_PRIO_IRQOFF |
+ GIC_PRIO_PSR_I_SET));
+ BUILD_BUG_ON(GICD_INT_DEF_PRI >= GIC_PRIO_IRQON);
gic_write_pmr(GIC_PRIO_IRQOFF);
}
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index db452aa9e651..f93204f319da 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -18,6 +18,7 @@
#include <linux/irqflags.h>
+#include <asm/arch_gicv3.h>
#include <asm/cpufeature.h>
#define DAIF_PROCCTX 0
@@ -32,6 +33,11 @@ static inline void local_daif_mask(void)
:
:
: "memory");
+
+ /* Don't really care for a dsb here, we don't intend to enable IRQs */
+ if (system_uses_irq_prio_masking())
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
+
trace_hardirqs_off();
}
@@ -43,7 +49,7 @@ static inline unsigned long local_daif_save(void)
if (system_uses_irq_prio_masking()) {
/* If IRQs are masked with PMR, reflect it in the flags */
- if (read_sysreg_s(SYS_ICC_PMR_EL1) <= GIC_PRIO_IRQOFF)
+ if (read_sysreg_s(SYS_ICC_PMR_EL1) != GIC_PRIO_IRQON)
flags |= PSR_I_BIT;
}
@@ -59,36 +65,44 @@ static inline void local_daif_restore(unsigned long flags)
if (!irq_disabled) {
trace_hardirqs_on();
- if (system_uses_irq_prio_masking())
- arch_local_irq_enable();
- } else if (!(flags & PSR_A_BIT)) {
- /*
- * If interrupts are disabled but we can take
- * asynchronous errors, we can take NMIs
- */
if (system_uses_irq_prio_masking()) {
- flags &= ~PSR_I_BIT;
+ gic_write_pmr(GIC_PRIO_IRQON);
+ dsb(sy);
+ }
+ } else if (system_uses_irq_prio_masking()) {
+ u64 pmr;
+
+ if (!(flags & PSR_A_BIT)) {
/*
- * There has been concern that the write to daif
- * might be reordered before this write to PMR.
- * From the ARM ARM DDI 0487D.a, section D1.7.1
- * "Accessing PSTATE fields":
- * Writes to the PSTATE fields have side-effects on
- * various aspects of the PE operation. All of these
- * side-effects are guaranteed:
- * - Not to be visible to earlier instructions in
- * the execution stream.
- * - To be visible to later instructions in the
- * execution stream
- *
- * Also, writes to PMR are self-synchronizing, so no
- * interrupts with a lower priority than PMR is signaled
- * to the PE after the write.
- *
- * So we don't need additional synchronization here.
+ * If interrupts are disabled but we can take
+ * asynchronous errors, we can take NMIs
*/
- arch_local_irq_disable();
+ flags &= ~PSR_I_BIT;
+ pmr = GIC_PRIO_IRQOFF;
+ } else {
+ pmr = GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET;
}
+
+ /*
+ * There has been concern that the write to daif
+ * might be reordered before this write to PMR.
+ * From the ARM ARM DDI 0487D.a, section D1.7.1
+ * "Accessing PSTATE fields":
+ * Writes to the PSTATE fields have side-effects on
+ * various aspects of the PE operation. All of these
+ * side-effects are guaranteed:
+ * - Not to be visible to earlier instructions in
+ * the execution stream.
+ * - To be visible to later instructions in the
+ * execution stream
+ *
+ * Also, writes to PMR are self-synchronizing, so no
+ * interrupts with a lower priority than PMR is signaled
+ * to the PE after the write.
+ *
+ * So we don't need additional synchronization here.
+ */
+ gic_write_pmr(pmr);
}
write_sysreg(flags, daif);
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index fbe1aba6ffb3..a1372722f12e 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -67,43 +67,46 @@ static inline void arch_local_irq_disable(void)
*/
static inline unsigned long arch_local_save_flags(void)
{
- unsigned long daif_bits;
unsigned long flags;
- daif_bits = read_sysreg(daif);
-
- /*
- * The asm is logically equivalent to:
- *
- * if (system_uses_irq_prio_masking())
- * flags = (daif_bits & PSR_I_BIT) ?
- * GIC_PRIO_IRQOFF :
- * read_sysreg_s(SYS_ICC_PMR_EL1);
- * else
- * flags = daif_bits;
- */
asm volatile(ALTERNATIVE(
- "mov %0, %1\n"
- "nop\n"
- "nop",
- __mrs_s("%0", SYS_ICC_PMR_EL1)
- "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
- "csel %0, %0, %2, eq",
- ARM64_HAS_IRQ_PRIO_MASKING)
- : "=&r" (flags), "+r" (daif_bits)
- : "r" ((unsigned long) GIC_PRIO_IRQOFF)
- : "cc", "memory");
+ "mrs %0, daif",
+ __mrs_s("%0", SYS_ICC_PMR_EL1),
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "=&r" (flags)
+ :
+ : "memory");
return flags;
}
+static inline int arch_irqs_disabled_flags(unsigned long flags)
+{
+ int res;
+
+ asm volatile(ALTERNATIVE(
+ "and %w0, %w1, #" __stringify(PSR_I_BIT),
+ "eor %w0, %w1, #" __stringify(GIC_PRIO_IRQON),
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "=&r" (res)
+ : "r" ((int) flags)
+ : "memory");
+
+ return res;
+}
+
static inline unsigned long arch_local_irq_save(void)
{
unsigned long flags;
flags = arch_local_save_flags();
- arch_local_irq_disable();
+ /*
+ * There are too many states with IRQs disabled, just keep the current
+ * state if interrupts are already disabled/masked.
+ */
+ if (!arch_irqs_disabled_flags(flags))
+ arch_local_irq_disable();
return flags;
}
@@ -124,21 +127,5 @@ static inline void arch_local_irq_restore(unsigned long flags)
: "memory");
}
-static inline int arch_irqs_disabled_flags(unsigned long flags)
-{
- int res;
-
- asm volatile(ALTERNATIVE(
- "and %w0, %w1, #" __stringify(PSR_I_BIT) "\n"
- "nop",
- "cmp %w1, #" __stringify(GIC_PRIO_IRQOFF) "\n"
- "cset %w0, ls",
- ARM64_HAS_IRQ_PRIO_MASKING)
- : "=&r" (res)
- : "r" ((int) flags)
- : "cc", "memory");
-
- return res;
-}
#endif
#endif
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4bcd9c1291d5..33410635b015 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -608,11 +608,12 @@ static inline void kvm_arm_vhe_guest_enter(void)
* will not signal the CPU of interrupts of lower priority, and the
* only way to get out will be via guest exceptions.
* Naturally, we want to avoid this.
+ *
+ * local_daif_mask() already sets GIC_PRIO_PSR_I_SET, we just need a
+ * dsb to ensure the redistributor is forwards EL2 IRQs to the CPU.
*/
- if (system_uses_irq_prio_masking()) {
- gic_write_pmr(GIC_PRIO_IRQON);
+ if (system_uses_irq_prio_masking())
dsb(sy);
- }
}
static inline void kvm_arm_vhe_guest_exit(void)
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index b2de32939ada..da2242248466 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -35,9 +35,15 @@
* means masking more IRQs (or at least that the same IRQs remain masked).
*
* To mask interrupts, we clear the most significant bit of PMR.
+ *
+ * Some code sections either automatically switch back to PSR.I or explicitly
+ * require to not use priority masking. If bit GIC_PRIO_PSR_I_SET is included
+ * in the the priority mask, it indicates that PSR.I should be set and
+ * interrupt disabling temporarily does not rely on IRQ priorities.
*/
-#define GIC_PRIO_IRQON 0xf0
-#define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80)
+#define GIC_PRIO_IRQON 0xc0
+#define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80)
+#define GIC_PRIO_PSR_I_SET (1 << 4)
/* Additional SPSR bits not exposed in the UABI */
#define PSR_IL_BIT (1 << 20)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6d5966346710..165da78815c5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -258,6 +258,7 @@ alternative_else_nop_endif
/*
* Registers that may be useful after this macro is invoked:
*
+ * x20 - ICC_PMR_EL1
* x21 - aborted SP
* x22 - aborted PC
* x23 - aborted PSTATE
@@ -449,6 +450,24 @@ alternative_endif
.endm
#endif
+ .macro gic_prio_kentry_setup, tmp:req
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ mov \tmp, #(GIC_PRIO_PSR_I_SET | GIC_PRIO_IRQON)
+ msr_s SYS_ICC_PMR_EL1, \tmp
+ alternative_else_nop_endif
+#endif
+ .endm
+
+ .macro gic_prio_irq_setup, pmr:req, tmp:req
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ orr \tmp, \pmr, #GIC_PRIO_PSR_I_SET
+ msr_s SYS_ICC_PMR_EL1, \tmp
+ alternative_else_nop_endif
+#endif
+ .endm
+
.text
/*
@@ -627,6 +646,7 @@ el1_dbg:
cmp x24, #ESR_ELx_EC_BRK64 // if BRK64
cinc x24, x24, eq // set bit '0'
tbz x24, #0, el1_inv // EL1 only
+ gic_prio_kentry_setup tmp=x3
mrs x0, far_el1
mov x2, sp // struct pt_regs
bl do_debug_exception
@@ -644,12 +664,10 @@ ENDPROC(el1_sync)
.align 6
el1_irq:
kernel_entry 1
+ gic_prio_irq_setup pmr=x20, tmp=x1
enable_da_f
#ifdef CONFIG_ARM64_PSEUDO_NMI
-alternative_if ARM64_HAS_IRQ_PRIO_MASKING
- ldr x20, [sp, #S_PMR_SAVE]
-alternative_else_nop_endif
test_irqs_unmasked res=x0, pmr=x20
cbz x0, 1f
bl asm_nmi_enter
@@ -679,8 +697,9 @@ alternative_else_nop_endif
#ifdef CONFIG_ARM64_PSEUDO_NMI
/*
- * if IRQs were disabled when we received the interrupt, we have an NMI
- * and we are not re-enabling interrupt upon eret. Skip tracing.
+ * When using IRQ priority masking, we can get spurious interrupts while
+ * PMR is set to GIC_PRIO_IRQOFF. An NMI might also have occurred in a
+ * section with interrupts disabled. Skip tracing in those cases.
*/
test_irqs_unmasked res=x0, pmr=x20
cbz x0, 1f
@@ -809,6 +828,7 @@ el0_ia:
* Instruction abort handling
*/
mrs x26, far_el1
+ gic_prio_kentry_setup tmp=x0
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
@@ -854,6 +874,7 @@ el0_sp_pc:
* Stack or PC alignment exception handling
*/
mrs x26, far_el1
+ gic_prio_kentry_setup tmp=x0
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
@@ -888,6 +909,7 @@ el0_dbg:
* Debug exception handling
*/
tbnz x24, #0, el0_inv // EL0 only
+ gic_prio_kentry_setup tmp=x3
mrs x0, far_el1
mov x1, x25
mov x2, sp
@@ -909,7 +931,9 @@ ENDPROC(el0_sync)
el0_irq:
kernel_entry 0
el0_irq_naked:
+ gic_prio_irq_setup pmr=x20, tmp=x0
enable_da_f
+
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
#endif
@@ -931,6 +955,7 @@ ENDPROC(el0_irq)
el1_error:
kernel_entry 1
mrs x1, esr_el1
+ gic_prio_kentry_setup tmp=x2
enable_dbg
mov x0, sp
bl do_serror
@@ -941,6 +966,7 @@ el0_error:
kernel_entry 0
el0_error_naked:
mrs x1, esr_el1
+ gic_prio_kentry_setup tmp=x2
enable_dbg
mov x0, sp
bl do_serror
@@ -965,6 +991,7 @@ work_pending:
*/
ret_to_user:
disable_daif
+ gic_prio_kentry_setup tmp=x3
ldr x1, [tsk, #TSK_TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnz x2, work_pending
@@ -981,6 +1008,7 @@ ENDPROC(ret_to_user)
*/
.align 6
el0_svc:
+ gic_prio_kentry_setup tmp=x1
mov x0, sp
bl el0_svc_handler
b ret_to_user
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 3767fb21a5b8..58efc3727778 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -94,7 +94,7 @@ static void __cpu_do_idle_irqprio(void)
* be raised.
*/
pmr = gic_read_pmr();
- gic_write_pmr(GIC_PRIO_IRQON);
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
__cpu_do_idle();
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index bb4b3f07761a..4deaee3c2a33 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -192,11 +192,13 @@ static void init_gic_priority_masking(void)
WARN_ON(!(cpuflags & PSR_I_BIT));
- gic_write_pmr(GIC_PRIO_IRQOFF);
-
/* We can only unmask PSR.I if we can take aborts */
- if (!(cpuflags & PSR_A_BIT))
+ if (!(cpuflags & PSR_A_BIT)) {
+ gic_write_pmr(GIC_PRIO_IRQOFF);
write_sysreg(cpuflags & ~PSR_I_BIT, daif);
+ } else {
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
+ }
}
/*
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8799e0c267d4..b89fcf0173b7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -615,7 +615,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
* Naturally, we want to avoid this.
*/
if (system_uses_irq_prio_masking()) {
- gic_write_pmr(GIC_PRIO_IRQON);
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
dsb(sy);
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 19 Jul 2019 18:41:10 +0200
Subject: [PATCH] KVM: nVMX: do not use dangling shadow VMCS after guest reset
If a KVM guest is reset while running a nested guest, free_nested will
disable the shadow VMCS execution control in the vmcs01. However,
on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
the VMCS12 to the shadow VMCS which has since been freed.
This causes a vmptrld of a NULL pointer on my machime, but Jan reports
the host to hang altogether. Let's see how much this trivial patch fixes.
Reported-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Liran Alon <liran.alon(a)oracle.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4f23e34f628b..0f1378789bd0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
{
secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
vmcs_write64(VMCS_LINK_POINTER, -1ull);
+ vmx->nested.need_vmcs12_to_shadow_sync = false;
}
static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
@@ -1341,6 +1342,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
unsigned long val;
int i;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
preempt_disable();
vmcs_load(shadow_vmcs);
@@ -1373,6 +1377,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
unsigned long val;
int i, q;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
vmcs_load(shadow_vmcs);
for (q = 0; q < ARRAY_SIZE(fields); q++) {
@@ -4436,7 +4443,6 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu)
/* copy to memory all shadowed fields in case
they were modified */
copy_shadow_to_vmcs12(vmx);
- vmx->nested.need_vmcs12_to_shadow_sync = false;
vmx_disable_shadow_vmcs(vmx);
}
vmx->nested.posted_intr_nv = -1;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From beb8d93b3e423043e079ef3dda19dad7b28467a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Fri, 19 Apr 2019 22:50:55 -0700
Subject: [PATCH] KVM: VMX: Fix handling of #MC that occurs during VM-Entry
A previous fix to prevent KVM from consuming stale VMCS state after a
failed VM-Entry inadvertantly blocked KVM's handling of machine checks
that occur during VM-Entry.
Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
depending on when the #MC is recognoized. As it pertains to this bug
fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
indicate the VM-Entry failed.
If a machine-check event occurs during a VM entry, one of the following occurs:
- The machine-check event is handled as if it occurred before the VM entry:
...
- The machine-check event is handled after VM entry completes:
...
- A VM-entry failure occurs as described in Section 26.7. The basic
exit reason is 41, for "VM-entry failure due to machine-check event".
Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
in a sane fashion and also simplifies vmx_complete_atomic_exit() since
VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
Fixes: b060ca3b2e9e7 ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d903f8909d1..1b3ca0582a0c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6107,28 +6107,21 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
{
- u32 exit_intr_info = 0;
- u16 basic_exit_reason = (u16)vmx->exit_reason;
-
- if (!(basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY
- || basic_exit_reason == EXIT_REASON_EXCEPTION_NMI))
+ if (vmx->exit_reason != EXIT_REASON_EXCEPTION_NMI)
return;
- if (!(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
- exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- vmx->exit_intr_info = exit_intr_info;
+ vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
/* if exit due to PF check for async PF */
- if (is_page_fault(exit_intr_info))
+ if (is_page_fault(vmx->exit_intr_info))
vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
/* Handle machine checks before interrupts are enabled */
- if (basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY ||
- is_machine_check(exit_intr_info))
+ if (is_machine_check(vmx->exit_intr_info))
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
- if (is_nmi(exit_intr_info)) {
+ if (is_nmi(vmx->exit_intr_info)) {
kvm_before_interrupt(&vmx->vcpu);
asm("int $2");
kvm_after_interrupt(&vmx->vcpu);
@@ -6535,6 +6528,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = 0;
vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
+ if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
+ kvm_machine_check();
+
if (vmx->fail || (vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
return;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From beb8d93b3e423043e079ef3dda19dad7b28467a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Fri, 19 Apr 2019 22:50:55 -0700
Subject: [PATCH] KVM: VMX: Fix handling of #MC that occurs during VM-Entry
A previous fix to prevent KVM from consuming stale VMCS state after a
failed VM-Entry inadvertantly blocked KVM's handling of machine checks
that occur during VM-Entry.
Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
depending on when the #MC is recognoized. As it pertains to this bug
fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
indicate the VM-Entry failed.
If a machine-check event occurs during a VM entry, one of the following occurs:
- The machine-check event is handled as if it occurred before the VM entry:
...
- The machine-check event is handled after VM entry completes:
...
- A VM-entry failure occurs as described in Section 26.7. The basic
exit reason is 41, for "VM-entry failure due to machine-check event".
Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
in a sane fashion and also simplifies vmx_complete_atomic_exit() since
VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
Fixes: b060ca3b2e9e7 ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d903f8909d1..1b3ca0582a0c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6107,28 +6107,21 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
{
- u32 exit_intr_info = 0;
- u16 basic_exit_reason = (u16)vmx->exit_reason;
-
- if (!(basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY
- || basic_exit_reason == EXIT_REASON_EXCEPTION_NMI))
+ if (vmx->exit_reason != EXIT_REASON_EXCEPTION_NMI)
return;
- if (!(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
- exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- vmx->exit_intr_info = exit_intr_info;
+ vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
/* if exit due to PF check for async PF */
- if (is_page_fault(exit_intr_info))
+ if (is_page_fault(vmx->exit_intr_info))
vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
/* Handle machine checks before interrupts are enabled */
- if (basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY ||
- is_machine_check(exit_intr_info))
+ if (is_machine_check(vmx->exit_intr_info))
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
- if (is_nmi(exit_intr_info)) {
+ if (is_nmi(vmx->exit_intr_info)) {
kvm_before_interrupt(&vmx->vcpu);
asm("int $2");
kvm_after_interrupt(&vmx->vcpu);
@@ -6535,6 +6528,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = 0;
vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
+ if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
+ kvm_machine_check();
+
if (vmx->fail || (vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
return;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3b013a2972d5bc344d6eaa8f24fdfe268211e45f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:28 -0700
Subject: [PATCH] KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from
vmcs01
If L1 does not set VM_ENTRY_LOAD_BNDCFGS, then L1's BNDCFGS value must
be propagated to vmcs02 since KVM always runs with VM_ENTRY_LOAD_BNDCFGS
when MPX is supported. Because the value effectively comes from vmcs01,
vmcs02 must be updated even if vmcs12 is clean.
Fixes: 62cf9bd8118c4 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
Cc: stable(a)vger.kernel.org
Cc: Liran Alon <liran.alon(a)oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6e82bbca2fe1..c4c0a45245b2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2228,13 +2228,9 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
set_cr4_guest_host_mask(vmx);
- if (kvm_mpx_supported()) {
- if (vmx->nested.nested_run_pending &&
- (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
- vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
- else
- vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
- }
+ if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
+ (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
+ vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
}
/*
@@ -2266,6 +2262,9 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.vmcs01_debugctl);
}
+ if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
+ !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
+ vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
vmx_set_rflags(vcpu, vmcs12->guest_rflags);
/* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8e2442a5f86e1f77b86401fce274a7f622740bc4 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Date: Fri, 12 Jul 2019 15:07:09 +0900
Subject: [PATCH] kconfig: fix missing choice values in auto.conf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 501fdcc5e999..1134892599da 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -895,7 +895,8 @@ int conf_write(const char *name)
"# %s\n"
"#\n", str);
need_newline = false;
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
@@ -903,7 +904,7 @@ int conf_write(const char *name)
fprintf(out, "\n");
need_newline = false;
}
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1063,8 +1064,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_touch_deps())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 8dde65bc3165..017843c9a4f4 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8e2442a5f86e1f77b86401fce274a7f622740bc4 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Date: Fri, 12 Jul 2019 15:07:09 +0900
Subject: [PATCH] kconfig: fix missing choice values in auto.conf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 501fdcc5e999..1134892599da 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -895,7 +895,8 @@ int conf_write(const char *name)
"# %s\n"
"#\n", str);
need_newline = false;
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
@@ -903,7 +904,7 @@ int conf_write(const char *name)
fprintf(out, "\n");
need_newline = false;
}
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1063,8 +1064,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_touch_deps())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 8dde65bc3165..017843c9a4f4 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d9771f5ec46c282d518b453c793635dbdc3a2a94 Mon Sep 17 00:00:00 2001
From: Xiao Ni <xni(a)redhat.com>
Date: Fri, 14 Jun 2019 15:41:05 -0700
Subject: [PATCH] raid5-cache: Need to do start() part job after adding journal
device
commit d5d885fd514f ("md: introduce new personality funciton start()")
splits the init job to two parts. The first part run() does the jobs that
do not require the md threads. The second part start() does the jobs that
require the md threads.
Now it just does run() in adding new journal device. It needs to do the
second part start() too.
Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
Cc: stable(a)vger.kernel.org #v4.9+
Reported-by: Michal Soltys <soltys(a)ziu.info>
Signed-off-by: Xiao Ni <xni(a)redhat.com>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b83bce2beb66..da94cbaa1a9e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7672,7 +7672,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
{
struct r5conf *conf = mddev->private;
- int err = -EEXIST;
+ int ret, err = -EEXIST;
int disk;
struct disk_info *p;
int first = 0;
@@ -7687,7 +7687,14 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
* The array is in readonly mode if journal is missing, so no
* write requests running. We should be safe
*/
- log_init(conf, rdev, false);
+ ret = log_init(conf, rdev, false);
+ if (ret)
+ return ret;
+
+ ret = r5l_start(conf->log);
+ if (ret)
+ return ret;
+
return 0;
}
if (mddev->recovery_disabled == conf->recovery_disabled)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c2c928c93173f220955030e8440517b87ec7df92 Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Fri, 21 Jun 2019 12:33:56 +0100
Subject: [PATCH] ASoC: core: Adapt for debugfs API change
Back in ff9fb72bc07705c (debugfs: return error values, not NULL) the
debugfs APIs were changed to return error pointers rather than NULL
pointers on error, breaking the error checking in ASoC. Update the
code to use IS_ERR() and log the codes that are returned as part of
the error messages.
Fixes: ff9fb72bc07705c (debugfs: return error values, not NULL)
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 9138fcb15cd3..6aeba0d66ec5 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -158,9 +158,10 @@ static void soc_init_component_debugfs(struct snd_soc_component *component)
component->card->debugfs_card_root);
}
- if (!component->debugfs_root) {
+ if (IS_ERR(component->debugfs_root)) {
dev_warn(component->dev,
- "ASoC: Failed to create component debugfs directory\n");
+ "ASoC: Failed to create component debugfs directory: %ld\n",
+ PTR_ERR(component->debugfs_root));
return;
}
@@ -212,18 +213,21 @@ static void soc_init_card_debugfs(struct snd_soc_card *card)
card->debugfs_card_root = debugfs_create_dir(card->name,
snd_soc_debugfs_root);
- if (!card->debugfs_card_root) {
+ if (IS_ERR(card->debugfs_card_root)) {
dev_warn(card->dev,
- "ASoC: Failed to create card debugfs directory\n");
+ "ASoC: Failed to create card debugfs directory: %ld\n",
+ PTR_ERR(card->debugfs_card_root));
+ card->debugfs_card_root = NULL;
return;
}
card->debugfs_pop_time = debugfs_create_u32("dapm_pop_time", 0644,
card->debugfs_card_root,
&card->pop_time);
- if (!card->debugfs_pop_time)
+ if (IS_ERR(card->debugfs_pop_time))
dev_warn(card->dev,
- "ASoC: Failed to create pop time debugfs file\n");
+ "ASoC: Failed to create pop time debugfs file: %ld\n",
+ PTR_ERR(card->debugfs_pop_time));
}
static void soc_cleanup_card_debugfs(struct snd_soc_card *card)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c2c928c93173f220955030e8440517b87ec7df92 Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Fri, 21 Jun 2019 12:33:56 +0100
Subject: [PATCH] ASoC: core: Adapt for debugfs API change
Back in ff9fb72bc07705c (debugfs: return error values, not NULL) the
debugfs APIs were changed to return error pointers rather than NULL
pointers on error, breaking the error checking in ASoC. Update the
code to use IS_ERR() and log the codes that are returned as part of
the error messages.
Fixes: ff9fb72bc07705c (debugfs: return error values, not NULL)
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 9138fcb15cd3..6aeba0d66ec5 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -158,9 +158,10 @@ static void soc_init_component_debugfs(struct snd_soc_component *component)
component->card->debugfs_card_root);
}
- if (!component->debugfs_root) {
+ if (IS_ERR(component->debugfs_root)) {
dev_warn(component->dev,
- "ASoC: Failed to create component debugfs directory\n");
+ "ASoC: Failed to create component debugfs directory: %ld\n",
+ PTR_ERR(component->debugfs_root));
return;
}
@@ -212,18 +213,21 @@ static void soc_init_card_debugfs(struct snd_soc_card *card)
card->debugfs_card_root = debugfs_create_dir(card->name,
snd_soc_debugfs_root);
- if (!card->debugfs_card_root) {
+ if (IS_ERR(card->debugfs_card_root)) {
dev_warn(card->dev,
- "ASoC: Failed to create card debugfs directory\n");
+ "ASoC: Failed to create card debugfs directory: %ld\n",
+ PTR_ERR(card->debugfs_card_root));
+ card->debugfs_card_root = NULL;
return;
}
card->debugfs_pop_time = debugfs_create_u32("dapm_pop_time", 0644,
card->debugfs_card_root,
&card->pop_time);
- if (!card->debugfs_pop_time)
+ if (IS_ERR(card->debugfs_pop_time))
dev_warn(card->dev,
- "ASoC: Failed to create pop time debugfs file\n");
+ "ASoC: Failed to create pop time debugfs file: %ld\n",
+ PTR_ERR(card->debugfs_pop_time));
}
static void soc_cleanup_card_debugfs(struct snd_soc_card *card)
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: d03360aaf5cc pNFS: Ensure we return the error if someone kills a waiting layoutget.
The bot has tested the following trees: v5.2.1, v5.1.18, v4.19.59.
v5.2.1: Build OK!
v5.1.18: Build OK!
v4.19.59: Failed to apply! Possible dependencies:
400417b05f3e ("pNFS: Fix a typo in pnfs_update_layout")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}