Re: [PATCH 5.15 000/183] 5.15.134-rc1 review

6 Oct 2023


      On Fri, Oct 06, 2023 at 01:57:14PM -0400, Liam R. Howlett wrote:
...

Paul E. McKenney paulmck@kernel.org [231006 12:47]:

...
On Fri, Oct 06, 2023 at 12:20:38PM -0400, Liam R. Howlett wrote:
...

Naresh Kamboju naresh.kamboju@linaro.org [231005 13:49]:

...
On Wed, 4 Oct 2023 at 23:33, Greg Kroah-Hartman
gregkh@linuxfoundation.org wrote:
...
This is the start of the stable review cycle for the 5.15.134 release.
There are 183 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 06 Oct 2023 17:51:12 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
        https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.134-rc...
or in the git tree and branch at:
        git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm.
Regressions on x86.
Following kernel warning noticed on x86 while booting stable-rc 5.15.134-rc1
with selftest merge config built kernel.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Anyone noticed this kernel warning ?
This is always reproducible while booting x86 with a given config.
...
From that config:
#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem
#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_TRACE=y
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging
...
x86 boot log:
[    0.000000] Linux version 5.15.134-rc1 (tuxmake@tuxmake)
(x86_64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils
for Debian) 2.40) #1 SMP @1696443178
...
[    1.480701] ------------[ cut here ]------------
[    1.481296] WARNING: CPU: 0 PID: 13 at kernel/rcu/tasks.h:958
trc_inspect_reader+0x80/0xb0
[    1.481296] Modules linked in:
[    1.481296] CPU: 0 PID: 13 Comm: rcu_tasks_trace Not tainted 5.15.134-rc1 #1
[    1.481296] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.5 11/26/2020
[    1.481296] RIP: 0010:trc_inspect_reader+0x80/0xb0
This function has changed a lot, including the dropping of this
WARN_ON_ONCE().  The warning was replaced in 897ba84dc5aa ("rcu-tasks:
Handle idle tasks for recently offlined CPUs") with something that looks
equivalent so I'm not sure why it would not trigger in newer revisions.
Obviously the behaviour I changed was the test for the task being idle.
I am not sure how best to short-circuit that test from happening during
boot as I am not familiar with the RCU code.
The usual test for RCU's notion of early boot being completed is
(rcu_scheduler_active != RCU_SCHEDULER_INIT).
Except that "ofl" should always be false that early in boot, at least
in mainline.
Is this still true in the final version of the patch where we set the
boot task as !idle until just before the early boot is finished?  I
wouldn't think of this as 'early in boot' anymore as much as the entire
kernel setup.  Maybe we need to shorten the time we stay in !idle mode
for earlier kernels?
In mainline, the ofl variable is defined as cpu_is_offline(cpu), and
during boot, the boot CPU is guaranteed to be online.  (As opposed to
the boot CPU's idle-task state.)
...
How frequent is this function called?  We could check something for
early boot... or track down where the cpu is put online and restore idle
before that happens?
Once per RCU Tasks Trace grace period per reader seen to be blocking
that grace period.  Its performance is as issue, but not to anywhere
near the same extent as (say) rcu_read_lock_trace().
...
...
...
It's also worth noting that the bug this fixes wasn't exposed until the
maple tree (added in v6.1) was used for the IRQ descriptors (added in
v6.5).
Lots of latent bugs, to be sure, even with rcutorture.  :-/
The Right Thing is to fix the bug all the way back to the introduction,
but what fallout makes the backport less desirable than living with the
unexposed bug?
You are quite right that it is possible for the risk of a backport to
exceed the risk of the original bug.
I defer to Joel (CCed) on how best to resolve this in -stable.
Thanx, Paul

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 5.15 000/183] 5.15.134-rc1 review

x86 boot log: