Hello CoreSight friends,
There is a bug recently reported on ChromeOS (with the linux kernel 6.6), https://issuetracker.google.com/329285580. Lockdep detected a deadlock in TMC ETR in a debug kernel build.
[12453.456471] WARNING: possible circular locking dependency detected [12453.456518] 6.6.21-lockdep-01250-gbecf1787b73c #1 Not tainted [12453.456575] ------------------------------------------------------ [12453.456618] perf/12414 is trying to acquire lock: [12453.456671] ffffffc084bc0990 (&dma_entry_hash[i].lock){-.-.}-{2:2}, at: check_sync+0x84/0x11a0 [12453.456896] but task is already holding lock: [12453.456938] ffffff80cdb940f8 (&drvdata->spinlock){....}-{2:2}, at: tmc_update_etr_buffer+0xbc/0x898 [coresight_tmc] [12453.457332] which lock already depends on the new lock.
Could you please take a look?
Thanks, Denis Nikitin
On 22/03/2024 06:33, Denis Nikitin wrote:
Hello CoreSight friends,
There is a bug recently reported on ChromeOS (with the linux kernel 6.6), https://issuetracker.google.com/329285580. Lockdep detected a deadlock in TMC ETR in a debug kernel build.
[12453.456471] WARNING: possible circular locking dependency detected [12453.456518] 6.6.21-lockdep-01250-gbecf1787b73c #1 Not tainted [12453.456575] ------------------------------------------------------ [12453.456618] perf/12414 is trying to acquire lock: [12453.456671] ffffffc084bc0990 (&dma_entry_hash[i].lock){-.-.}-{2:2}, at: check_sync+0x84/0x11a0 [12453.456896] but task is already holding lock: [12453.456938] ffffff80cdb940f8 (&drvdata->spinlock){....}-{2:2}, at: tmc_update_etr_buffer+0xbc/0x898 [coresight_tmc] [12453.457332] which lock already depends on the new lock.
Could you please take a look?
Thanks, Denis Nikitin _______________________________________________ CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org
Hi Denis,
Thanks for the report. From reading the ticket I'm assuming that this isn't a regression but is more something that's rare and intermittent or related to some particular build config?
Are you able to share the config? And are there any specific reproducer instructions or is it just any normal "perf record -e cs_etm//" type command?
Thanks James
Hi James,
From reading the ticket I'm assuming that this isn't a regression but is more something that's rare and intermittent or related to some particular build config?
Right, I think it's only reproducible when CONFIG_DEBUG_INFO is enabled. It doesn't look like a regression, but debug mode might reveal an edge case when deadlock is possible. You can find the config file attached in https://issuetracker.google.com/329285580#comment6.
Are you able to share the config? And are there any specific reproducer instructions or is it just any normal "perf record -e cs_etm//" type command?
According to https://issuetracker.google.com/329285580#comment7 it's reproducible when "perf record -e cs_etm/autofdo/uk -a" is running with additional workload, for example an android app on a chromeos device.
Thanks, Denis
On Fri, Mar 22, 2024 at 3:37 AM James Clark james.clark@arm.com wrote:
On 22/03/2024 06:33, Denis Nikitin wrote:
Hello CoreSight friends,
There is a bug recently reported on ChromeOS (with the linux kernel 6.6), https://issuetracker.google.com/329285580. Lockdep detected a deadlock in TMC ETR in a debug kernel build.
[12453.456471] WARNING: possible circular locking dependency detected [12453.456518] 6.6.21-lockdep-01250-gbecf1787b73c #1 Not tainted [12453.456575] ------------------------------------------------------ [12453.456618] perf/12414 is trying to acquire lock: [12453.456671] ffffffc084bc0990 (&dma_entry_hash[i].lock){-.-.}-{2:2}, at: check_sync+0x84/0x11a0 [12453.456896] but task is already holding lock: [12453.456938] ffffff80cdb940f8 (&drvdata->spinlock){....}-{2:2}, at: tmc_update_etr_buffer+0xbc/0x898 [coresight_tmc] [12453.457332] which lock already depends on the new lock.
Could you please take a look?
Thanks, Denis Nikitin _______________________________________________ CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org
Hi Denis,
Thanks for the report. From reading the ticket I'm assuming that this isn't a regression but is more something that's rare and intermittent or related to some particular build config?
Are you able to share the config? And are there any specific reproducer instructions or is it just any normal "perf record -e cs_etm//" type command?
Thanks James
Hello Denis,
On 3/22/24 12:03, Denis Nikitin wrote:
Hello CoreSight friends,
There is a bug recently reported on ChromeOS (with the linux kernel 6.6), https://issuetracker.google.com/329285580. Lockdep detected a deadlock in TMC ETR in a debug kernel build.
[12453.456471] WARNING: possible circular locking dependency detected [12453.456518] 6.6.21-lockdep-01250-gbecf1787b73c #1 Not tainted [12453.456575] ------------------------------------------------------ [12453.456618] perf/12414 is trying to acquire lock: [12453.456671] ffffffc084bc0990 (&dma_entry_hash[i].lock){-.-.}-{2:2}, at: check_sync+0x84/0x11a0 [12453.456896] but task is already holding lock: [12453.456938] ffffff80cdb940f8 (&drvdata->spinlock){....}-{2:2}, at: tmc_update_etr_buffer+0xbc/0x898 [coresight_tmc] [12453.457332] which lock already depends on the new lock.
Could you please take a look?
Looking at the call chain for TMC ETR devices which might interact with DMA buffer operations.
etm_pmu->stop() etm_event_stop() sink_ops(sink)->update_buffer() tmc_update_etr_buffer() spin_lock_irqsave(&drvdata->spinlock, ...) tmc_sync_etr_buf(drvdata) spin_unlock_irqrestore(&drvdata->spinlock, ...)
tmc_sync_etr_buf() etr_buf->ops->sync(etr_buf, rrp, rwp) tmc_etr_sync_sg_buf() tmc_sg_table_sync_data_range() dma_sync_single_for_cpu(... DMA_TO_DEVICE) debug_dma_sync_single_for_cpu() check_sync() check_sync() get_hash_bucket() spin_lock_irqsave(&dma_entry_hash[idx].lock, ...) ----- printks() ---- put_hash_bucket() spin_unlock_irqrestore(&bucket->lock, ...)
Could it be that these printks() might be the root cause of the lockdep splats ? Should not printk() be avoided while locks are being held ? Similar problem might happen also via check_unmap() as being reported. Although I am also wondering if there is any real cyclic lock dependency here but without these DMA printks() in scenario ?
- Anshuman