Failure after v6.1-rc1-2-g665c157e0204: coresight: cti: Fix hang in cti_disable_hw():
Results changed to
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
23489
# First few build errors in logs:
# 00:40:26 drivers/hwtracing/coresight/coresight-cti-core.c:93:24: error: unused variable ‘dev’ [-Werror=unused-variable]
# 00:40:26 drivers/hwtracing/coresight/coresight-cti-core.c:154:24: error: unused variable ‘dev’ [-Werror=unused-variable]
# 00:40:27 make[3]: *** [scripts/Makefile.build:250: drivers/hwtracing/coresight/coresight-cti-core.o] Error 1
# 00:40:34 make[2]: *** [scripts/Makefile.build:500: drivers/hwtracing/coresight] Error 2
# 00:41:32 make[1]: *** [scripts/Makefile.build:500: drivers] Error 2
# 00:41:32 make: *** [Makefile:1992: .] Error 2
from
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
32261
# linux build successful:
all
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
For latest status see comments in https://linaro.atlassian.net/browse/GNU-681 .
Status of v6.1-rc1-2-g665c157e0204 commit for tcwg_kernel:
commit 665c157e0204176023860b51a46528ba0ba62c33
Author: James Clark <james.clark(a)arm.com>
Date: Wed Oct 5 14:14:52 2022 +0100
coresight: cti: Fix hang in cti_disable_hw()
cti_enable_hw() and cti_disable_hw() are called from an atomic context
so shouldn't use runtime PM because it can result in a sleep when
communicating with firmware.
Since commit 3c6656337852 ("Revert "firmware: arm_scmi: Add clock
management to the SCMI power domain""), this causes a hang on Juno when
running the Perf Coresight tests or running this command:
perf record -e cs_etm//u -- ls
This was also missed until the revert commit because pm_runtime_put()
was called with the wrong device until commit 692c9a499b28 ("coresight:
cti: Correct the parameter for pm_runtime_put")
With lock and scheduler debugging enabled the following is output:
coresight cti_sys0: cti_enable_hw -- dev:cti_sys0 parent: 20020000.cti
BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
Call trace:
dump_backtrace+0x134/0x140
show_stack+0x20/0x58
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
__might_resched+0x180/0x228
__might_sleep+0x50/0x88
__pm_runtime_resume+0xac/0xb0
cti_enable+0x44/0x120
coresight_control_assoc_ectdev+0xc0/0x150
coresight_enable_path+0xb4/0x288
etm_event_start+0x138/0x170
etm_event_add+0x48/0x70
event_sched_in.isra.122+0xb4/0x280
merge_sched_in+0x1fc/0x3d0
visit_groups_merge.constprop.137+0x16c/0x4b0
ctx_sched_in+0x114/0x1f0
perf_event_sched_in+0x60/0x90
ctx_resched+0x68/0xb0
perf_event_exec+0x138/0x508
begin_new_exec+0x52c/0xd40
load_elf_binary+0x6b8/0x17d0
bprm_execve+0x360/0x7f8
do_execveat_common.isra.47+0x218/0x238
__arm64_sys_execve+0x48/0x60
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.4+0xfc/0x120
do_el0_svc+0x34/0xc0
el0_svc+0x40/0x98
el0t_64_sync_handler+0x98/0xc0
el0t_64_sync+0x170/0x174
Fix the issue by removing the runtime PM calls completely. They are not
needed here because it must have already been done when building the
path for a trace.
Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Reported-by: Aishwarya TCV <Aishwarya.TCV(a)arm.com>
Reported-by: Cristian Marussi <Cristian.Marussi(a)arm.com>
Suggested-by: Suzuki Poulose <Suzuki.Poulose(a)arm.com>
Signed-off-by: James Clark <james.clark(a)arm.com>
Reviewed-by: Mike Leach <mike.leach(a)linaro.org>
Tested-by: Mike Leach <mike.leach(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Link: https://lore.kernel.org/r/20221005131452.1506328-1-james.clark@arm.com
* gnu-release-aarch64-next-allmodconfig
** Failure after v6.1-rc1-2-g665c157e0204: coresight: cti: Fix hang in cti_disable_hw():
** https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-release-aarch64-next-al…
Bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-release-aarch64-next-al…
Good build: https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-release-aarch64-next-al…
Reproduce current build:
<cut>
mkdir -p investigate-linux-665c157e0204176023860b51a46528ba0ba62c33
cd investigate-linux-665c157e0204176023860b51a46528ba0ba62c33
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests for bad and good builds
mkdir -p bad/artifacts good/artifacts
curl -o bad/artifacts/manifest.sh https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-release-aarch64-next-al… --fail
curl -o good/artifacts/manifest.sh https://ci.linaro.org/job/tcwg_kernel-gnu-build-gnu-release-aarch64-next-al… --fail
# Reproduce bad build
(cd bad; ../jenkins-scripts/tcwg_kernel-build.sh ^^ true %%rr[top_artifacts] artifacts)
# Reproduce good build
(cd good; ../jenkins-scripts/tcwg_kernel-build.sh ^^ true %%rr[top_artifacts] artifacts)
</cut>
Full commit (up to 1000 lines):
<cut>
commit 665c157e0204176023860b51a46528ba0ba62c33
Author: James Clark <james.clark(a)arm.com>
Date: Wed Oct 5 14:14:52 2022 +0100
coresight: cti: Fix hang in cti_disable_hw()
cti_enable_hw() and cti_disable_hw() are called from an atomic context
so shouldn't use runtime PM because it can result in a sleep when
communicating with firmware.
Since commit 3c6656337852 ("Revert "firmware: arm_scmi: Add clock
management to the SCMI power domain""), this causes a hang on Juno when
running the Perf Coresight tests or running this command:
perf record -e cs_etm//u -- ls
This was also missed until the revert commit because pm_runtime_put()
was called with the wrong device until commit 692c9a499b28 ("coresight:
cti: Correct the parameter for pm_runtime_put")
With lock and scheduler debugging enabled the following is output:
coresight cti_sys0: cti_enable_hw -- dev:cti_sys0 parent: 20020000.cti
BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
Call trace:
dump_backtrace+0x134/0x140
show_stack+0x20/0x58
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
__might_resched+0x180/0x228
__might_sleep+0x50/0x88
__pm_runtime_resume+0xac/0xb0
cti_enable+0x44/0x120
coresight_control_assoc_ectdev+0xc0/0x150
coresight_enable_path+0xb4/0x288
etm_event_start+0x138/0x170
etm_event_add+0x48/0x70
event_sched_in.isra.122+0xb4/0x280
merge_sched_in+0x1fc/0x3d0
visit_groups_merge.constprop.137+0x16c/0x4b0
ctx_sched_in+0x114/0x1f0
perf_event_sched_in+0x60/0x90
ctx_resched+0x68/0xb0
perf_event_exec+0x138/0x508
begin_new_exec+0x52c/0xd40
load_elf_binary+0x6b8/0x17d0
bprm_execve+0x360/0x7f8
do_execveat_common.isra.47+0x218/0x238
__arm64_sys_execve+0x48/0x60
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.4+0xfc/0x120
do_el0_svc+0x34/0xc0
el0_svc+0x40/0x98
el0t_64_sync_handler+0x98/0xc0
el0t_64_sync+0x170/0x174
Fix the issue by removing the runtime PM calls completely. They are not
needed here because it must have already been done when building the
path for a trace.
Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Reported-by: Aishwarya TCV <Aishwarya.TCV(a)arm.com>
Reported-by: Cristian Marussi <Cristian.Marussi(a)arm.com>
Suggested-by: Suzuki Poulose <Suzuki.Poulose(a)arm.com>
Signed-off-by: James Clark <james.clark(a)arm.com>
Reviewed-by: Mike Leach <mike.leach(a)linaro.org>
Tested-by: Mike Leach <mike.leach(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Link: https://lore.kernel.org/r/20221005131452.1506328-1-james.clark@arm.com
---
drivers/hwtracing/coresight/coresight-cti-core.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index 1be92342b5b9..4a02ae23d3a0 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -94,7 +94,6 @@ static int cti_enable_hw(struct cti_drvdata *drvdata)
unsigned long flags;
int rc = 0;
- pm_runtime_get_sync(dev->parent);
spin_lock_irqsave(&drvdata->spinlock, flags);
/* no need to do anything if enabled or unpowered*/
@@ -119,7 +118,6 @@ static int cti_enable_hw(struct cti_drvdata *drvdata)
/* cannot enable due to error */
cti_err_not_enabled:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
- pm_runtime_put(dev->parent);
return rc;
}
@@ -175,7 +173,6 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
coresight_disclaim_device_unlocked(csdev);
CS_LOCK(drvdata->base);
spin_unlock(&drvdata->spinlock);
- pm_runtime_put(dev->parent);
return 0;
/* not disabled this call */
</cut>