On 13/06/22 15:40, Greg Kroah-Hartman wrote:
From: Saravana Kannansaravanak@google.com
[ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
Mounting NFS rootfs was timing out when deferred_probe_timeout was non-zero [1]. This was because ip_auto_config() initcall times out waiting for the network interfaces to show up when deferred_probe_timeout was non-zero. While ip_auto_config() calls wait_for_device_probe() to make sure any currently running deferred probe work or asynchronous probe finishes, that wasn't sufficient to account for devices being deferred until deferred_probe_timeout.
Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires") tried to fix that by making sure wait_for_device_probe() waits for deferred_probe_timeout to expire before returning.
However, if wait_for_device_probe() is called from the kernel_init() context:
Before deferred_probe_initcall() [2], it causes the boot process to hang due to a deadlock.
After deferred_probe_initcall() [3], it blocks kernel_init() from continuing till deferred_probe_timeout expires and beats the point of deferred_probe_timeout that's trying to wait for userspace to load modules.
Neither of this is good. So revert the changes to wait_for_device_probe().
[1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01M... [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/ [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
Hi Saravana, Greg,
KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb, see the following details for more information.
KernelCI dashboard link: https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
Error messages from the logs :-
+ UUID=11236495_1.5.2.4.5 + set +x + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin' + cd /opt/bootrr/libexec/bootrr + sh helpers/bootrr-auto e6800000.ethernet e6700000.dma-controller e7300000.dma-controller e7310000.dma-controller ec700000.dma-controller ec720000.dma-controller fea20000.vsp feb00000.display fea28000.vsp fea30000.vsp fe9a0000.vsp fe9af000.fcp fea27000.fcp fea2f000.fcp fea37000.fcp sound ee100000.mmc ee140000.mmc ec500000.sound /lava-11236495/1/../bin/lava-test-case <8>[ 17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
Test case failing :- Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
Regression Reproduced :-
Lava job after reverting the commit 5ee76c256e92 https://lava.collabora.dev/scheduler/job/11292890
Bisection report from KernelCI can be found at the bottom of the email.
Thanks, Shreeya Patel
#regzbot introduced: 5ee76c256e92 #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
---------------------------------------------------------------------------------------------------------------------------------------------------
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * If you do send a fix, please include this trailer: * * Reported-by: "kernelci.org bot" <bot@...> * * * * Hope this helps! * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on r8a77960-ulcb
Summary: Start: 686c84f2f136 Linux 5.10.189-rc1 Plain log: https://storage.kernelci.org/stable-rc/linux-5.10.y/v5.10.188-183-g686c84f2f... HTML log: https://storage.kernelci.org/stable-rc/linux-5.10.y/v5.10.188-183-g686c84f2f... Result: 71cbce75031a driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction
Checks: revert: PASS verify: PASS
Parameters: Tree: stable-rc URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-5.10.y Target: r8a77960-ulcb CPU arch: arm64 Lab: lab-collabora Compiler: gcc-10 Config: defconfig Test case: baseline.bootrr.deferred-probe-empty
Breaking commit found:
------------------------------------------------------------------------------- commit 71cbce75031aed26c72c2dc8a83111d181685f1b Author: Saravana Kannan <saravanak@...> Date: Fri Jun 3 13:31:37 2022 +0200
driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction
[ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
Mounting NFS rootfs was timing out when deferred_probe_timeout was non-zero [1]. This was because ip_auto_config() initcall times out waiting for the network interfaces to show up when deferred_probe_timeout was non-zero. While ip_auto_config() calls wait_for_device_probe() to make sure any currently running deferred probe work or asynchronous probe finishes, that wasn't sufficient to account for devices being deferred until deferred_probe_timeout.
Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires") tried to fix that by making sure wait_for_device_probe() waits for deferred_probe_timeout to expire before returning.
However, if wait_for_device_probe() is called from the kernel_init() context:
- Before deferred_probe_initcall() [2], it causes the boot process to hang due to a deadlock.
- After deferred_probe_initcall() [3], it blocks kernel_init() from continuing till deferred_probe_timeout expires and beats the point of deferred_probe_timeout that's trying to wait for userspace to load modules.
Neither of this is good. So revert the changes to wait_for_device_probe().
[1] - https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@TYAPR01M... [2] - https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/ [3] - https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@linutronix.de/
Fixes: 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires") Cc: John Stultz <jstultz@...> Cc: "David S. Miller" <davem@...> Cc: Alexey Kuznetsov <kuznet@...> Cc: Hideaki YOSHIFUJI <yoshfuji@...> Cc: Jakub Kicinski <kuba@...> Cc: Rob Herring <robh@...> Cc: Geert Uytterhoeven <geert@...> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...> Cc: Robin Murphy <robin.murphy@...> Cc: Andy Shevchenko <andy.shevchenko@...> Cc: Sudeep Holla <sudeep.holla@...> Cc: Andy Shevchenko <andriy.shevchenko@...> Cc: Naresh Kamboju <naresh.kamboju@...> Cc: Basil Eljuse <Basil.Eljuse@...> Cc: Ferry Toth <fntoth@...> Cc: Arnd Bergmann <arnd@...> Cc: Anders Roxell <anders.roxell@...> Cc: linux-pm@... Reported-by: Nathan Chancellor <nathan@...> Reported-by: Sebastian Andrzej Siewior <bigeasy@...> Tested-by: Geert Uytterhoeven <geert+renesas@...> Acked-by: John Stultz <jstultz@...> Signed-off-by: Saravana Kannan <saravanak@...> Link: https://lore.kernel.org/r/20220526034609.480766-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@...> Reviewed-by: Rafael J. Wysocki <rafael@...> Signed-off-by: Linus Torvalds <torvalds@...> Signed-off-by: Sasha Levin <sashal@...>
diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 4f4e8aedbd2c..f9d9f1ad9215 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -250,7 +250,6 @@ DEFINE_SHOW_ATTRIBUTE(deferred_devs);
int driver_deferred_probe_timeout; EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout); -static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);
static int __init deferred_probe_timeout_setup(char *str) { @@ -302,7 +301,6 @@ static void deferred_probe_timeout_work_func(struct work_struct *work) list_for_each_entry(p, &deferred_probe_pending_list, deferred_probe) dev_info(p->device, "deferred probe pending\n"); mutex_unlock(&deferred_probe_mutex); - wake_up_all(&probe_timeout_waitqueue); } static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
@@ -706,9 +704,6 @@ int driver_probe_done(void) */ void wait_for_device_probe(void) { - /* wait for probe timeout */ - wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout); - /* wait for the deferred probe workqueue to finish */ flush_work(&deferred_probe_work); -------------------------------------------------------------------------------
Git bisection log:
------------------------------------------------------------------------------- git bisect start # good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10 git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442 # bad: [686c84f2f136412631eb684b064def993a96a8cc] Linux 5.10.189-rc1 git bisect bad 686c84f2f136412631eb684b064def993a96a8cc # good: [88f1b613c37fbd3c4171f5a9decdcd12ae704637] Bluetooth: cmtp: fix possible panic when cmtp_init_sockets() fails git bisect good 88f1b613c37fbd3c4171f5a9decdcd12ae704637 # bad: [6c5742372b2d5d36de129439e26eda05aab54652] Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address git bisect bad 6c5742372b2d5d36de129439e26eda05aab54652 # good: [07280d2c3f33d47741f42411eb8c976b70c6657a] random: make more consistent use of integer types git bisect good 07280d2c3f33d47741f42411eb8c976b70c6657a # bad: [2fc7f18ba2f98d15f174ce8e25a5afa46926eb55] tools headers: Remove broken definition of __LITTLE_ENDIAN git bisect bad 2fc7f18ba2f98d15f174ce8e25a5afa46926eb55 # bad: [c2ae49a113a5344232f1ebb93bcf18bbd11e9c39] net: dsa: lantiq_gswip: Fix refcount leak in gswip_gphy_fw_list git bisect bad c2ae49a113a5344232f1ebb93bcf18bbd11e9c39 # good: [c1b08aa568e829b743affe5d3231e6de28b7609e] ASoC: samsung: Use dev_err_probe() helper git bisect good c1b08aa568e829b743affe5d3231e6de28b7609e # good: [97a9ec86ccb4e336ecde46db42b59b2ff7e0d719] drm/nouveau/clk: Fix an incorrect NULL check on list iterator git bisect good 97a9ec86ccb4e336ecde46db42b59b2ff7e0d719 # good: [572211d631d7665c6690b5a6cb80436f8c368dc1] pwm: lp3943: Fix duty calculation in case period was clamped git bisect good 572211d631d7665c6690b5a6cb80436f8c368dc1 # good: [8f49e1694cbc29e76d5028267c1978cc2630e494] bpf: Fix probe read error in ___bpf_prog_run() git bisect good 8f49e1694cbc29e76d5028267c1978cc2630e494 # bad: [3660db29b0305f9a1d95979c7af0f5db6ea99f5d] iommu/arm-smmu: fix possible null-ptr-deref in arm_smmu_device_probe() git bisect bad 3660db29b0305f9a1d95979c7af0f5db6ea99f5d # good: [04622d631826ba483ae3a0b8a71c745d8e21453d] gpio: pca953x: use the correct register address to do regcache sync git bisect good 04622d631826ba483ae3a0b8a71c745d8e21453d # bad: [32be2b805a1a13ccc68bd209ec3ae198dd3ba5d6] perf c2c: Fix sorting in percent_rmt_hitm_cmp() git bisect bad 32be2b805a1a13ccc68bd209ec3ae198dd3ba5d6 # good: [c1f0187025905e9981000d44a92e159468b561a8] scsi: sd: Fix potential NULL pointer dereference git bisect good c1f0187025905e9981000d44a92e159468b561a8 # bad: [71cbce75031aed26c72c2dc8a83111d181685f1b] driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction git bisect bad 71cbce75031aed26c72c2dc8a83111d181685f1b # good: [b8fac8e321044a9ac50f7185b4e9d91a7745e4b0] tipc: check attribute length for bearer name git bisect good b8fac8e321044a9ac50f7185b4e9d91a7745e4b0 # first bad commit: [71cbce75031aed26c72c2dc8a83111d181685f1b] driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction -------------------------------------------------------------------------------
Fixes: 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires") Cc: John Stultzjstultz@google.com Cc: "David S. Miller"davem@davemloft.net Cc: Alexey Kuznetsovkuznet@ms2.inr.ac.ru Cc: Hideaki YOSHIFUJIyoshfuji@linux-ipv6.org Cc: Jakub Kicinskikuba@kernel.org Cc: Rob Herringrobh@kernel.org Cc: Geert Uytterhoevengeert@linux-m68k.org Cc: Yoshihiro Shimodayoshihiro.shimoda.uh@renesas.com Cc: Robin Murphyrobin.murphy@arm.com Cc: Andy Shevchenkoandy.shevchenko@gmail.com Cc: Sudeep Hollasudeep.holla@arm.com Cc: Andy Shevchenkoandriy.shevchenko@linux.intel.com Cc: Naresh Kambojunaresh.kamboju@linaro.org Cc: Basil EljuseBasil.Eljuse@arm.com Cc: Ferry Tothfntoth@gmail.com Cc: Arnd Bergmannarnd@arndb.de Cc: Anders Roxellanders.roxell@linaro.org Cc:linux-pm@vger.kernel.org Reported-by: Nathan Chancellornathan@kernel.org Reported-by: Sebastian Andrzej Siewiorbigeasy@linutronix.de Tested-by: Geert Uytterhoevengeert+renesas@glider.be Acked-by: John Stultzjstultz@google.com Signed-off-by: Saravana Kannansaravanak@google.com Link:https://lore.kernel.org/r/20220526034609.480766-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartmangregkh@linuxfoundation.org Reviewed-by: Rafael J. Wysockirafael@kernel.org Signed-off-by: Linus Torvaldstorvalds@linux-foundation.org Signed-off-by: Sasha Levinsashal@kernel.org
drivers/base/dd.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 977e94cf669e..86fd2ea35656 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -257,7 +257,6 @@ DEFINE_SHOW_ATTRIBUTE(deferred_devs); int driver_deferred_probe_timeout; EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout); -static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue); static int __init deferred_probe_timeout_setup(char *str) { @@ -312,7 +311,6 @@ static void deferred_probe_timeout_work_func(struct work_struct *work) list_for_each_entry(p, &deferred_probe_pending_list, deferred_probe) dev_info(p->device, "deferred probe pending\n"); mutex_unlock(&deferred_probe_mutex);
- wake_up_all(&probe_timeout_waitqueue); } static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
@@ -720,9 +718,6 @@ int driver_probe_done(void) */ void wait_for_device_probe(void) {
- /* wait for probe timeout */
- wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
- /* wait for the deferred probe workqueue to finish */ flush_work(&deferred_probe_work);