Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 6b14caa1dc57 - Linux 5.3.13-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://artifacts.cki-project.org/pipelines/301338
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
ppc64le:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
x86_64:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running are marked with ⏱. Reports for non-upstream kernels have
a Beaker recipe linked to next to each host.
On Fri, Nov 22, 2019 at 8:00 AM Sasha Levin <sashal(a)kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.3.11, v4.19.84, v4.14.154, v4.9.201, v4.4.201.
>
> v5.3.11: Build OK!
> v4.19.84: Build OK!
Ok, good.
> v4.14.154: Failed to apply! Possible dependencies:
> 6dd0394f5fcd ("media: v4l2-compat-ioctl32: better name userspace pointers")
> fef6cc6b3618 ("media: v4l2-compat-ioctl32: fix several __user annotations")
The fef6cc6b3618 is probably a candidate for backporting (it fixes smatch
and sparse warnings and should have no other effect), the 6dd0394f5fcd
may be a little too big (but also harmless).
The downside of not backporting the patch is that user space code built
with 64-bit time_t would get incorrect data rather than failing with an
error code on older kernels.
I do not expect to see backports of 64-bit time_t support to kernels older
than 4.19, so this probably won't matter much, but in theory it's still
possible that users can run into it.
> v4.9.201: Failed to apply! Possible dependencies:
> 6dd0394f5fcd ("media: v4l2-compat-ioctl32: better name userspace pointers")
> a56bc171598c ("[media] v4l: compat: Prevent allocating excessive amounts of memory")
> ba7ed691dcce ("[media] v4l2-compat-ioctl32: VIDIOC_S_EDID should return all fields on error")
> fb9ffa6a7f7e ("[media] v4l: Add metadata buffer type and format")
> fef6cc6b3618 ("media: v4l2-compat-ioctl32: fix several __user annotations")
>
> v4.4.201: Failed to apply! Possible dependencies:
> 0579e6e3a326 ("doc-rst: linux_tv: remove whitespaces")
> 17defc282fe6 ("Documentation: add meta-documentation for Sphinx and kernel-doc")
> 22cba31bae9d ("Documentation/sphinx: add basic working Sphinx configuration and build")
> 234d549662a7 ("doc-rst: video: use reference for VIDIOC_ENUMINPUT")
> 5377d91f3e88 ("doc-rst: linux_tv DocBook to reST migration (docs-next)")
> 6dd0394f5fcd ("media: v4l2-compat-ioctl32: better name userspace pointers")
> 7347081e8a52 ("doc-rst: linux_tv: simplify references")
> 789818845202 ("doc-rst: audio: Fix some cross references")
> 94fff0dc5333 ("doc-rst: dmx_fcalls: improve man-like format")
> 9e00ffca8cc7 ("doc-rst: querycap: fix troubles on some references")
> af4a4d0db8ab ("doc-rst: linux_tv: Replace reference names to match ioctls")
> c2b66cafdf02 ("[media] v4l: doc: Remove row numbers from tables")
> e6702ee18e24 ("doc-rst: app-pri: Fix a bad reference")
> fb9ffa6a7f7e ("[media] v4l: Add metadata buffer type and format")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
I'm happy to provide a hand-backported version of the patch for the older
kernels if Mauro and Hans think we should do that, otherwise I think it's
we're fine with having it on 4.19+.
Arnd
This is a note to let you know that I've just added the patch titled
staging: comedi: usbduxfast: usbduxfast_ai_cmdtest rounding error
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 5618332e5b955b4bff06d0b88146b971c8dd7b32 Mon Sep 17 00:00:00 2001
From: Bernd Porr <mail(a)berndporr.me.uk>
Date: Mon, 18 Nov 2019 23:07:59 +0000
Subject: staging: comedi: usbduxfast: usbduxfast_ai_cmdtest rounding error
The userspace comedilib function 'get_cmd_generic_timed' fills
the cmd structure with an informed guess and then calls the
function 'usbduxfast_ai_cmdtest' in this driver repeatedly while
'usbduxfast_ai_cmdtest' is modifying the cmd struct until it
no longer changes. However, because of rounding errors this never
converged because 'steps = (cmd->convert_arg * 30) / 1000' and then
back to 'cmd->convert_arg = (steps * 1000) / 30' won't be the same
because of rounding errors. 'Steps' should only be converted back to
the 'convert_arg' if 'steps' has actually been modified. In addition
the case of steps being 0 wasn't checked which is also now done.
Signed-off-by: Bernd Porr <mail(a)berndporr.me.uk>
Cc: <stable(a)vger.kernel.org> # 4.4+
Reviewed-by: Ian Abbott <abbotti(a)mev.co.uk>
Link: https://lore.kernel.org/r/20191118230759.1727-1-mail@berndporr.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/comedi/drivers/usbduxfast.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/staging/comedi/drivers/usbduxfast.c b/drivers/staging/comedi/drivers/usbduxfast.c
index 04bc488385e6..4af012968cb6 100644
--- a/drivers/staging/comedi/drivers/usbduxfast.c
+++ b/drivers/staging/comedi/drivers/usbduxfast.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0+
/*
- * Copyright (C) 2004-2014 Bernd Porr, mail(a)berndporr.me.uk
+ * Copyright (C) 2004-2019 Bernd Porr, mail(a)berndporr.me.uk
*/
/*
@@ -8,7 +8,7 @@
* Description: University of Stirling USB DAQ & INCITE Technology Limited
* Devices: [ITL] USB-DUX-FAST (usbduxfast)
* Author: Bernd Porr <mail(a)berndporr.me.uk>
- * Updated: 10 Oct 2014
+ * Updated: 16 Nov 2019
* Status: stable
*/
@@ -22,6 +22,7 @@
*
*
* Revision history:
+ * 1.0: Fixed a rounding error in usbduxfast_ai_cmdtest
* 0.9: Dropping the first data packet which seems to be from the last transfer.
* Buffer overflows in the FX2 are handed over to comedi.
* 0.92: Dropping now 4 packets. The quad buffer has to be emptied.
@@ -350,6 +351,7 @@ static int usbduxfast_ai_cmdtest(struct comedi_device *dev,
struct comedi_cmd *cmd)
{
int err = 0;
+ int err2 = 0;
unsigned int steps;
unsigned int arg;
@@ -399,11 +401,16 @@ static int usbduxfast_ai_cmdtest(struct comedi_device *dev,
*/
steps = (cmd->convert_arg * 30) / 1000;
if (cmd->chanlist_len != 1)
- err |= comedi_check_trigger_arg_min(&steps,
- MIN_SAMPLING_PERIOD);
- err |= comedi_check_trigger_arg_max(&steps, MAX_SAMPLING_PERIOD);
- arg = (steps * 1000) / 30;
- err |= comedi_check_trigger_arg_is(&cmd->convert_arg, arg);
+ err2 |= comedi_check_trigger_arg_min(&steps,
+ MIN_SAMPLING_PERIOD);
+ else
+ err2 |= comedi_check_trigger_arg_min(&steps, 1);
+ err2 |= comedi_check_trigger_arg_max(&steps, MAX_SAMPLING_PERIOD);
+ if (err2) {
+ err |= err2;
+ arg = (steps * 1000) / 30;
+ err |= comedi_check_trigger_arg_is(&cmd->convert_arg, arg);
+ }
if (cmd->stop_src == TRIG_COUNT)
err |= comedi_check_trigger_arg_min(&cmd->stop_arg, 1);
--
2.24.0
This is a note to let you know that I've just added the patch titled
serial: stm32: fix clearing interrupt error flags
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 1250ed7114a977cdc2a67a0c09d6cdda63970eb9 Mon Sep 17 00:00:00 2001
From: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Date: Thu, 21 Nov 2019 09:10:49 +0100
Subject: serial: stm32: fix clearing interrupt error flags
The interrupt clear flag register is a "write 1 to clear" register.
So, only writing ones allows to clear flags:
- Replace buggy stm32_clr_bits() by a simple write to clear error flags
- Replace useless read/modify/write stm32_set_bits() routine by a
simple write to clear TC (transfer complete) flag.
Fixes: 4f01d833fdcd ("serial: stm32: fix rx error handling")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1574323849-1909-1-git-send-email-fabrice.gasnier@…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/stm32-usart.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index df90747ee3a8..2f72514d63ed 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -240,8 +240,8 @@ static void stm32_receive_chars(struct uart_port *port, bool threaded)
* cleared by the sequence [read SR - read DR].
*/
if ((sr & USART_SR_ERR_MASK) && ofs->icr != UNDEF_REG)
- stm32_clr_bits(port, ofs->icr, USART_ICR_ORECF |
- USART_ICR_PECF | USART_ICR_FECF);
+ writel_relaxed(sr & USART_SR_ERR_MASK,
+ port->membase + ofs->icr);
c = stm32_get_char(port, &sr, &stm32_port->last_res);
port->icount.rx++;
@@ -435,7 +435,7 @@ static void stm32_transmit_chars(struct uart_port *port)
if (ofs->icr == UNDEF_REG)
stm32_clr_bits(port, ofs->isr, USART_SR_TC);
else
- stm32_set_bits(port, ofs->icr, USART_ICR_TCCF);
+ writel_relaxed(USART_ICR_TCCF, port->membase + ofs->icr);
if (stm32_port->tx_ch)
stm32_transmit_chars_dma(port);
--
2.24.0
From: Pavel Tatashin <pasha.tatashin(a)soleen.com>
commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e upstream.
A number of our uaccess routines ('__arch_clear_user()' and
'__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
encounter an unhandled fault whilst accessing userspace.
For CPUs implementing both hardware PAN and UAO, this bug has no effect
when both extensions are in use by the kernel.
For CPUs implementing hardware PAN but not UAO, this means that a kernel
using hardware PAN may execute portions of code with PAN inadvertently
disabled, opening us up to potential security vulnerabilities that rely
on userspace access from within the kernel which would usually be
prevented by this mechanism. In other words, parts of the kernel run the
same way as they would on a CPU without PAN implemented/emulated at all.
For CPUs not implementing hardware PAN and instead relying on software
emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
much worse. Calling 'schedule()' with software PAN disabled means that
the next task will execute in the kernel using the page-table and ASID
of the previous process even after 'switch_mm()', since the actual
hardware switch is deferred until return to userspace. At this point, or
if there is a intermediate call to 'uaccess_enable()', the page-table
and ASID of the new process are installed. Sadly, due to the changes
introduced by KPTI, this is not an atomic operation and there is a very
small window (two instructions) where the CPU is configured with the
page-table of the old task and the ASID of the new task; a speculative
access in this state is disastrous because it would corrupt the TLB
entries for the new task with mappings from the previous address space.
As Pavel explains:
| I was able to reproduce memory corruption problem on Broadcom's SoC
| ARMv8-A like this:
|
| Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
| stack is accessed and copied.
|
| The test program performed the following on every CPU and forking
| many processes:
|
| unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
| MAP_SHARED | MAP_ANONYMOUS, -1, 0);
| map[0] = getpid();
| sched_yield();
| if (map[0] != getpid()) {
| fprintf(stderr, "Corruption detected!");
| }
| munmap(map, PAGE_SIZE);
|
| From time to time I was getting map[0] to contain pid for a
| different process.
Ensure that PAN is re-enabled when returning after an unhandled user
fault from our uaccess routines.
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Reviewed-by: Mark Rutland <mark.rutland(a)arm.com>
Tested-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)soleen.com>
[will: rewrote commit message]
[will: backport for 4.9.y stable kernels]
Signed-off-by: Will Deacon <will(a)kernel.org>
---
arch/arm64/lib/clear_user.S | 2 ++
arch/arm64/lib/copy_from_user.S | 2 ++
arch/arm64/lib/copy_in_user.S | 2 ++
arch/arm64/lib/copy_to_user.S | 2 ++
4 files changed, 8 insertions(+)
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
index efbf610eaf4e..a814f32033b0 100644
--- a/arch/arm64/lib/clear_user.S
+++ b/arch/arm64/lib/clear_user.S
@@ -62,5 +62,7 @@ ENDPROC(__arch_clear_user)
.section .fixup,"ax"
.align 2
9: mov x0, x2 // return the original size
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 4fd67ea03bb0..580aca96c53c 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -80,5 +80,7 @@ ENDPROC(__arch_copy_from_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S
index 841bf8f7fab7..d9ca6a4f33b3 100644
--- a/arch/arm64/lib/copy_in_user.S
+++ b/arch/arm64/lib/copy_in_user.S
@@ -81,5 +81,7 @@ ENDPROC(__arch_copy_in_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 7a7efe255034..e8bd40dc00cd 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -79,5 +79,7 @@ ENDPROC(__arch_copy_to_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \
+ CONFIG_ARM64_PAN)
ret
.previous
--
2.24.0.432.g9d3f5f5b63-goog
From: Pavel Tatashin <pasha.tatashin(a)soleen.com>
commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e upstream.
A number of our uaccess routines ('__arch_clear_user()' and
'__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
encounter an unhandled fault whilst accessing userspace.
For CPUs implementing both hardware PAN and UAO, this bug has no effect
when both extensions are in use by the kernel.
For CPUs implementing hardware PAN but not UAO, this means that a kernel
using hardware PAN may execute portions of code with PAN inadvertently
disabled, opening us up to potential security vulnerabilities that rely
on userspace access from within the kernel which would usually be
prevented by this mechanism. In other words, parts of the kernel run the
same way as they would on a CPU without PAN implemented/emulated at all.
For CPUs not implementing hardware PAN and instead relying on software
emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
much worse. Calling 'schedule()' with software PAN disabled means that
the next task will execute in the kernel using the page-table and ASID
of the previous process even after 'switch_mm()', since the actual
hardware switch is deferred until return to userspace. At this point, or
if there is a intermediate call to 'uaccess_enable()', the page-table
and ASID of the new process are installed. Sadly, due to the changes
introduced by KPTI, this is not an atomic operation and there is a very
small window (two instructions) where the CPU is configured with the
page-table of the old task and the ASID of the new task; a speculative
access in this state is disastrous because it would corrupt the TLB
entries for the new task with mappings from the previous address space.
As Pavel explains:
| I was able to reproduce memory corruption problem on Broadcom's SoC
| ARMv8-A like this:
|
| Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
| stack is accessed and copied.
|
| The test program performed the following on every CPU and forking
| many processes:
|
| unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
| MAP_SHARED | MAP_ANONYMOUS, -1, 0);
| map[0] = getpid();
| sched_yield();
| if (map[0] != getpid()) {
| fprintf(stderr, "Corruption detected!");
| }
| munmap(map, PAGE_SIZE);
|
| From time to time I was getting map[0] to contain pid for a
| different process.
Ensure that PAN is re-enabled when returning after an unhandled user
fault from our uaccess routines.
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Reviewed-by: Mark Rutland <mark.rutland(a)arm.com>
Tested-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)soleen.com>
[will: rewrote commit message]
[will: backport for 4.4.y stable kernels]
Signed-off-by: Will Deacon <will(a)kernel.org>
---
arch/arm64/lib/clear_user.S | 2 ++
arch/arm64/lib/copy_from_user.S | 2 ++
arch/arm64/lib/copy_in_user.S | 2 ++
arch/arm64/lib/copy_to_user.S | 2 ++
4 files changed, 8 insertions(+)
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
index a9723c71c52b..8d330c30a6f9 100644
--- a/arch/arm64/lib/clear_user.S
+++ b/arch/arm64/lib/clear_user.S
@@ -62,5 +62,7 @@ ENDPROC(__clear_user)
.section .fixup,"ax"
.align 2
9: mov x0, x2 // return the original size
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_HAS_PAN, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 4699cd74f87e..b8c95ef13229 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -85,5 +85,7 @@ ENDPROC(__copy_from_user)
strb wzr, [dst], #1 // zero remaining buffer space
cmp dst, end
b.lo 9999b
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_HAS_PAN, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S
index 81c8fc93c100..233703c84bcd 100644
--- a/arch/arm64/lib/copy_in_user.S
+++ b/arch/arm64/lib/copy_in_user.S
@@ -81,5 +81,7 @@ ENDPROC(__copy_in_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_HAS_PAN, \
+ CONFIG_ARM64_PAN)
ret
.previous
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 7512bbbc07ac..62b179408b23 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -79,5 +79,7 @@ ENDPROC(__copy_to_user)
.section .fixup,"ax"
.align 2
9998: sub x0, end, dst // bytes not copied
+ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_HAS_PAN, \
+ CONFIG_ARM64_PAN)
ret
.previous
--
2.24.0.432.g9d3f5f5b63-goog
We've encountered a rcu stall in get_mem_cgroup_from_mm():
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
(t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
<...>
RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
<...>
__memcg_kmem_charge+0x55/0x140
__alloc_pages_nodemask+0x267/0x320
pipe_write+0x1ad/0x400
new_sync_write+0x127/0x1c0
__kernel_write+0x4f/0xf0
dump_emit+0x91/0xc0
writenote+0xa0/0xc0
elf_core_dump+0x11af/0x1430
do_coredump+0xc65/0xee0
? unix_stream_sendmsg+0x37d/0x3b0
get_signal+0x132/0x7c0
do_signal+0x36/0x640
? recalc_sigpending+0x17/0x50
exit_to_usermode_loop+0x61/0xd0
do_syscall_64+0xd4/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The problem is caused by an exiting task which is associated with
an offline memcg. We're iterating over and over in the
do {} while (!css_tryget_online()) loop, but obviously the memcg won't
become online and the exiting task won't be migrated to a live memcg.
Let's fix it by switching from css_tryget_online() to css_tryget().
As css_tryget_online() cannot guarantee that the memcg won't go
offline, the check is usually useless, except some rare cases
when for example it determines if something should be presented
to a user.
A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Cc: stable(a)vger.kernel.org
Cc: Tejun Heo <tj(a)kernel.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50f5bc55fcec..c5b5f74cfd4d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -939,7 +939,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
if (unlikely(!memcg))
memcg = root_mem_cgroup;
}
- } while (!css_tryget_online(&memcg->css));
+ } while (!css_tryget(&memcg->css));
rcu_read_unlock();
return memcg;
}
--
2.17.1
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 7c4042a4d0b7532cfbc90478fd3084b2dab5849e ]
When dif and first burst is used in a write command wqe, the driver was not
properly setting fields in the io command request. This resulted in no dif
bytes being sent and invalid xfer_rdy's, resulting in the io being aborted
by the hardware.
Correct the wqe initializaton when both dif and first burst are used.
Signed-off-by: Dick Kennedy <dick.kennedy(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc_scsi.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index bae36cc3740b6..ab6bff60478f9 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -2707,6 +2707,7 @@ lpfc_bg_scsi_prep_dma_buf_s3(struct lpfc_hba *phba,
int datasegcnt, protsegcnt, datadir = scsi_cmnd->sc_data_direction;
int prot_group_type = 0;
int fcpdl;
+ struct lpfc_vport *vport = phba->pport;
/*
* Start the lpfc command prep by bumping the bpl beyond fcp_cmnd
@@ -2812,6 +2813,14 @@ lpfc_bg_scsi_prep_dma_buf_s3(struct lpfc_hba *phba,
*/
iocb_cmd->un.fcpi.fcpi_parm = fcpdl;
+ /*
+ * For First burst, we may need to adjust the initial transfer
+ * length for DIF
+ */
+ if (iocb_cmd->un.fcpi.fcpi_XRdy &&
+ (fcpdl < vport->cfg_first_burst_size))
+ iocb_cmd->un.fcpi.fcpi_XRdy = fcpdl;
+
return 0;
err:
if (lpfc_cmd->seg_cnt)
@@ -3361,6 +3370,7 @@ lpfc_bg_scsi_prep_dma_buf_s4(struct lpfc_hba *phba,
int datasegcnt, protsegcnt, datadir = scsi_cmnd->sc_data_direction;
int prot_group_type = 0;
int fcpdl;
+ struct lpfc_vport *vport = phba->pport;
/*
* Start the lpfc command prep by bumping the sgl beyond fcp_cmnd
@@ -3476,6 +3486,14 @@ lpfc_bg_scsi_prep_dma_buf_s4(struct lpfc_hba *phba,
*/
iocb_cmd->un.fcpi.fcpi_parm = fcpdl;
+ /*
+ * For First burst, we may need to adjust the initial transfer
+ * length for DIF
+ */
+ if (iocb_cmd->un.fcpi.fcpi_XRdy &&
+ (fcpdl < vport->cfg_first_burst_size))
+ iocb_cmd->un.fcpi.fcpi_XRdy = fcpdl;
+
/*
* If the OAS driver feature is enabled and the lun is enabled for
* OAS, set the oas iocb related flags.
--
2.20.1