This patchset adds support for CPU-wide trace scenarios and as such, it is now possible to issue the following commands:
# perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND # perf record -e cs_etm/@20070000.etr/ -a $COMMAND
The above will trace all instructions executed by a given processor for as long as $COMMAND hasn't returned. The solution is designed to work for both 1:1 and N:1 source/sink topologies, though the former hasn't been tested for lack of access to HW.
Most of the changes revolve around allowing more than one event to use a sink when operated from perf. More specifically the first event to use a sink switches it on while the last one is tasked to aggregate traces and switching off the device.
This is the kernel part of the solution, with the user space portion to be released in a separate set. All the patches have been rebased on yesterday's linux next and hosted here[1]. Everything has been tested on Juno. I have not CC'ed the kernel mailing list because of the ongoing merge window.
Review and comments would be most appreciated.
Regards, Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/log/?h=next-2019...
Mathieu Poirier (20): coresight: pmu: Adding ITRACE property to cs_etm PMU coresight: etm4x: Add kernel configuration for CONTEXTID coresight: etm4x: Configure tracers to emit timestamps coresight: Adding return code to sink::disable() operation coresight: Move reference counting inside sink drivers coresight: Refactor sink::disable() functions coresight: Refactor sink::update() functions coresight: perf: Refactor function etm_setup_aux() coresight: perf: Refactor function free_event_data() coresight: Introduce the notion of process ID to the framework coresight: tmc-etr: Refactor function tmc_etr_setup_perf_buf() coresight: tmc-etr: Introduce the notion of process ID to ETR devices coresight: tmc-etr: Allow events to use the same ETR buffer coresight: tmc-etr: Add support for CPU-wide trace scenarios coresight: tmc-etf: Add support for CPU-wide trace scenarios coresight: etb10: Add support for CPU-wide trace scenarios coresight: Refactor sink::alloc_buffer() functions coresight: Add function coresight_sink_is_shared() coresight: tmc-etr: Make ETR aware of topology coresight: Use event->cpu to determine session type
drivers/hwtracing/coresight/coresight-etb10.c | 79 +++++- .../hwtracing/coresight/coresight-etm-perf.c | 47 +++- drivers/hwtracing/coresight/coresight-etm4x.c | 114 +++++++- drivers/hwtracing/coresight/coresight-priv.h | 1 + .../hwtracing/coresight/coresight-tmc-etf.c | 84 ++++-- .../hwtracing/coresight/coresight-tmc-etr.c | 265 +++++++++++++++--- drivers/hwtracing/coresight/coresight-tmc.c | 4 + drivers/hwtracing/coresight/coresight-tmc.h | 11 + drivers/hwtracing/coresight/coresight-tpiu.c | 9 +- drivers/hwtracing/coresight/coresight.c | 53 +++- include/linux/coresight-pmu.h | 2 + include/linux/coresight.h | 8 +- tools/include/linux/coresight-pmu.h | 2 + 13 files changed, 568 insertions(+), 111 deletions(-)
Add to the capabilities the ITRACE property so that ITRACE START events are generated when the PMU is switched on by the core.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 4d5a2b9f9d6a..25ae56e924bb 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -566,7 +566,8 @@ static int __init etm_perf_init(void) { int ret;
- etm_pmu.capabilities = PERF_PMU_CAP_EXCLUSIVE; + etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE | + PERF_PMU_CAP_ITRACE);
etm_pmu.attr_groups = etm_pmu_attr_groups; etm_pmu.task_ctx_nr = perf_sw_context;
Set the proper bit in the configuration register when contextID tracing has been requested by user space. That way PE_CONTEXT elements are generated by the tracers when a process is installed on a CPU.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x.c | 5 +++++ include/linux/coresight-pmu.h | 2 ++ tools/include/linux/coresight-pmu.h | 2 ++ 4 files changed, 11 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 25ae56e924bb..bbfed70b3402 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -29,6 +29,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
/* ETMv3.5/PTM's ETMCR is 'config' */ PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); +PMU_FORMAT_ATTR(contextid, "config:" __stringify(ETM_OPT_CTXTID)); PMU_FORMAT_ATTR(timestamp, "config:" __stringify(ETM_OPT_TS)); PMU_FORMAT_ATTR(retstack, "config:" __stringify(ETM_OPT_RETSTK)); /* Sink ID - same for all ETMs */ @@ -36,6 +37,7 @@ PMU_FORMAT_ATTR(sinkid, "config2:0-31");
static struct attribute *etm_config_formats_attr[] = { &format_attr_cycacc.attr, + &format_attr_contextid.attr, &format_attr_timestamp.attr, &format_attr_retstack.attr, &format_attr_sinkid.attr, diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 08ce37c9475d..f067f7ee08c7 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -239,6 +239,11 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, if (attr->config & BIT(ETM_OPT_TS)) /* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11); + + if (attr->config & BIT(ETM_OPT_CTXTID)) + /* bit[6], Context ID tracing bit */ + config->cfg |= BIT(6); + /* return stack - enable if selected and supported */ if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack) /* bit[12], Return stack enable bit */ diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index a1a959ba24ff..b0e35eec6499 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -12,11 +12,13 @@
/* ETMv3.5/PTM's ETMCR config bit */ #define ETM_OPT_CYCACC 12 +#define ETM_OPT_CTXTID 14 #define ETM_OPT_TS 28 #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ #define ETM4_CFG_BIT_CYCACC 4 +#define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_TS 11 #define ETM4_CFG_BIT_RETSTK 12
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index a1a959ba24ff..b0e35eec6499 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -12,11 +12,13 @@
/* ETMv3.5/PTM's ETMCR config bit */ #define ETM_OPT_CYCACC 12 +#define ETM_OPT_CTXTID 14 #define ETM_OPT_TS 28 #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ #define ETM4_CFG_BIT_CYCACC 4 +#define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_TS 11 #define ETM4_CFG_BIT_RETSTK 12
On 06/03/2019 22:57, Mathieu Poirier wrote:
Set the proper bit in the configuration register when contextID tracing has been requested by user space. That way PE_CONTEXT elements are generated by the tracers when a process is installed on a CPU.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x.c | 5 +++++ include/linux/coresight-pmu.h | 2 ++ tools/include/linux/coresight-pmu.h | 2 ++ 4 files changed, 11 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 25ae56e924bb..bbfed70b3402 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -29,6 +29,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src); /* ETMv3.5/PTM's ETMCR is 'config' */ PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); +PMU_FORMAT_ATTR(contextid, "config:" __stringify(ETM_OPT_CTXTID)); PMU_FORMAT_ATTR(timestamp, "config:" __stringify(ETM_OPT_TS)); PMU_FORMAT_ATTR(retstack, "config:" __stringify(ETM_OPT_RETSTK)); /* Sink ID - same for all ETMs */ @@ -36,6 +37,7 @@ PMU_FORMAT_ATTR(sinkid, "config2:0-31"); static struct attribute *etm_config_formats_attr[] = { &format_attr_cycacc.attr,
- &format_attr_contextid.attr, &format_attr_timestamp.attr, &format_attr_retstack.attr, &format_attr_sinkid.attr,
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 08ce37c9475d..f067f7ee08c7 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -239,6 +239,11 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, if (attr->config & BIT(ETM_OPT_TS)) /* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11);
- if (attr->config & BIT(ETM_OPT_CTXTID))
/* bit[6], Context ID tracing bit */
config->cfg |= BIT(6);
nit: Could we use the ETM4_CFG_BIT_CTXTID here rather than hard coding the value we already define ?
Otherwise:
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
Btw, someday, we could make this an array of attr->config bits defs and their corresponding drvdata->config bit defs and be done with it, rather than adding separate checks like.
- /* return stack - enable if selected and supported */ if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack) /* bit[12], Return stack enable bit */
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index a1a959ba24ff..b0e35eec6499 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -12,11 +12,13 @@ /* ETMv3.5/PTM's ETMCR config bit */ #define ETM_OPT_CYCACC 12 +#define ETM_OPT_CTXTID 14 #define ETM_OPT_TS 28 #define ETM_OPT_RETSTK 29 /* ETMv4 CONFIGR programming bits for the ETM OPTs */ #define ETM4_CFG_BIT_CYCACC 4 +#define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_TS 11 #define ETM4_CFG_BIT_RETSTK 12 diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index a1a959ba24ff..b0e35eec6499 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -12,11 +12,13 @@ /* ETMv3.5/PTM's ETMCR config bit */ #define ETM_OPT_CYCACC 12 +#define ETM_OPT_CTXTID 14 #define ETM_OPT_TS 28 #define ETM_OPT_RETSTK 29 /* ETMv4 CONFIGR programming bits for the ETM OPTs */ #define ETM4_CFG_BIT_CYCACC 4 +#define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_TS 11 #define ETM4_CFG_BIT_RETSTK 12
nit: Should this be really exposed to the perf tool ? The tool must always rely on the "format" strings, isn't it ? That way, we are free to change the kernel definitions and get the tool working everywhere ?
Cheers Suzuki
On Thu, 7 Mar 2019 at 03:44, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 06/03/2019 22:57, Mathieu Poirier wrote:
Set the proper bit in the configuration register when contextID tracing has been requested by user space. That way PE_CONTEXT elements are generated by the tracers when a process is installed on a CPU.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ drivers/hwtracing/coresight/coresight-etm4x.c | 5 +++++ include/linux/coresight-pmu.h | 2 ++ tools/include/linux/coresight-pmu.h | 2 ++ 4 files changed, 11 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 25ae56e924bb..bbfed70b3402 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -29,6 +29,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
/* ETMv3.5/PTM's ETMCR is 'config' */ PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC)); +PMU_FORMAT_ATTR(contextid, "config:" __stringify(ETM_OPT_CTXTID)); PMU_FORMAT_ATTR(timestamp, "config:" __stringify(ETM_OPT_TS)); PMU_FORMAT_ATTR(retstack, "config:" __stringify(ETM_OPT_RETSTK)); /* Sink ID - same for all ETMs */ @@ -36,6 +37,7 @@ PMU_FORMAT_ATTR(sinkid, "config2:0-31");
static struct attribute *etm_config_formats_attr[] = { &format_attr_cycacc.attr,
&format_attr_contextid.attr, &format_attr_timestamp.attr, &format_attr_retstack.attr, &format_attr_sinkid.attr,
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 08ce37c9475d..f067f7ee08c7 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -239,6 +239,11 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, if (attr->config & BIT(ETM_OPT_TS)) /* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11);
if (attr->config & BIT(ETM_OPT_CTXTID))
/* bit[6], Context ID tracing bit */
config->cfg |= BIT(6);
nit: Could we use the ETM4_CFG_BIT_CTXTID here rather than hard coding the value we already define ?
We certainly can.
Otherwise:
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
Btw, someday, we could make this an array of attr->config bits defs and their corresponding drvdata->config bit defs and be done with it, rather than adding separate checks like.
A conversion must be made because configuration bits differ between ETMv3.x and ETMv4. Since ETMv3.x came first I kept the bits specified for that architecture but it is only a matter of time before that strategy doesn't work anymore. I envision a time in the not so distant future where new options will come out with configuration bits completely orthogonal to what we currently have. When that happens user space will have to set attr->config for each ETM flavour, essentially removing the need for bit conversion. At that point attr->config and drvdata->config will be the same for each ETM architecture.
/* return stack - enable if selected and supported */ if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack) /* bit[12], Return stack enable bit */
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index a1a959ba24ff..b0e35eec6499 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -12,11 +12,13 @@
/* ETMv3.5/PTM's ETMCR config bit */ #define ETM_OPT_CYCACC 12 +#define ETM_OPT_CTXTID 14 #define ETM_OPT_TS 28 #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ #define ETM4_CFG_BIT_CYCACC 4 +#define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_TS 11 #define ETM4_CFG_BIT_RETSTK 12
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index a1a959ba24ff..b0e35eec6499 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -12,11 +12,13 @@
/* ETMv3.5/PTM's ETMCR config bit */ #define ETM_OPT_CYCACC 12 +#define ETM_OPT_CTXTID 14 #define ETM_OPT_TS 28 #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ #define ETM4_CFG_BIT_CYCACC 4 +#define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_TS 11 #define ETM4_CFG_BIT_RETSTK 12
nit: Should this be really exposed to the perf tool ? The tool must always rely on the "format" strings, isn't it ? That way, we are free to change the kernel definitions and get the tool working everywhere ?
We don't have a choice to copy coresight-pmu.h over to the tools' side. Otherwise there is a script that compares the perf headers with the kernel headers that complains bitterly, something that makes Acme very unhappy.
Thanks for looking at this set. Mathieu
Cheers Suzuki
On 07/03/2019 17:53, Mathieu Poirier wrote:
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 08ce37c9475d..f067f7ee08c7 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -239,6 +239,11 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, if (attr->config & BIT(ETM_OPT_TS)) /* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11);
if (attr->config & BIT(ETM_OPT_CTXTID))
/* bit[6], Context ID tracing bit */
config->cfg |= BIT(6);
nit: Could we use the ETM4_CFG_BIT_CTXTID here rather than hard coding the value we already define ?
We certainly can.
Otherwise:
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
Btw, someday, we could make this an array of attr->config bits defs and their corresponding drvdata->config bit defs and be done with it, rather than adding separate checks like.
A conversion must be made because configuration bits differ between ETMv3.x and ETMv4. Since ETMv3.x came first I kept the bits specified for that architecture but it is only a matter of time before that strategy doesn't work anymore. I envision a time in the not so distant future where new options will come out with configuration bits completely orthogonal to what we currently have. When that happens user space will have to set attr->config for each ETM flavour, essentially removing the need for bit conversion. At that point attr->config and drvdata->config will be the same for each ETM architecture.
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index a1a959ba24ff..b0e35eec6499 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -12,11 +12,13 @@
/* ETMv3.5/PTM's ETMCR config bit */ #define ETM_OPT_CYCACC 12 +#define ETM_OPT_CTXTID 14 #define ETM_OPT_TS 28 #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ #define ETM4_CFG_BIT_CYCACC 4 +#define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_TS 11 #define ETM4_CFG_BIT_RETSTK 12
nit: Should this be really exposed to the perf tool ? The tool must always rely on the "format" strings, isn't it ? That way, we are free to change the kernel definitions and get the tool working everywhere ?
We don't have a choice to copy coresight-pmu.h over to the tools' side. Otherwise there is a script that compares the perf headers with the kernel headers that complains bitterly, something that makes Acme very unhappy.
Ok. But, my question is, why do the perf tool need to know this from the header file. AFAICS, all of this must be exposed via sysfs format. We could certainly remove the CFG bits.
Cheers Suzuki
On Thu, 14 Mar 2019 at 05:16, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 07/03/2019 17:53, Mathieu Poirier wrote:
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 08ce37c9475d..f067f7ee08c7 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -239,6 +239,11 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, if (attr->config & BIT(ETM_OPT_TS)) /* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11);
if (attr->config & BIT(ETM_OPT_CTXTID))
/* bit[6], Context ID tracing bit */
config->cfg |= BIT(6);
nit: Could we use the ETM4_CFG_BIT_CTXTID here rather than hard coding the value we already define ?
We certainly can.
Otherwise:
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
Btw, someday, we could make this an array of attr->config bits defs and their corresponding drvdata->config bit defs and be done with it, rather than adding separate checks like.
A conversion must be made because configuration bits differ between ETMv3.x and ETMv4. Since ETMv3.x came first I kept the bits specified for that architecture but it is only a matter of time before that strategy doesn't work anymore. I envision a time in the not so distant future where new options will come out with configuration bits completely orthogonal to what we currently have. When that happens user space will have to set attr->config for each ETM flavour, essentially removing the need for bit conversion. At that point attr->config and drvdata->config will be the same for each ETM architecture.
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h index a1a959ba24ff..b0e35eec6499 100644 --- a/tools/include/linux/coresight-pmu.h +++ b/tools/include/linux/coresight-pmu.h @@ -12,11 +12,13 @@
/* ETMv3.5/PTM's ETMCR config bit */ #define ETM_OPT_CYCACC 12 +#define ETM_OPT_CTXTID 14 #define ETM_OPT_TS 28 #define ETM_OPT_RETSTK 29
/* ETMv4 CONFIGR programming bits for the ETM OPTs */ #define ETM4_CFG_BIT_CYCACC 4 +#define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_TS 11 #define ETM4_CFG_BIT_RETSTK 12
nit: Should this be really exposed to the perf tool ? The tool must always rely on the "format" strings, isn't it ? That way, we are free to change the kernel definitions and get the tool working everywhere ?
We don't have a choice to copy coresight-pmu.h over to the tools' side. Otherwise there is a script that compares the perf headers with the kernel headers that complains bitterly, something that makes Acme very unhappy.
Ok. But, my question is, why do the perf tool need to know this from the header file. AFAICS, all of this must be exposed via sysfs format. We could certainly remove the CFG bits.
I see your point and it is a valid one. On the flip side doing so would introduce a fair amount of churn in the perf tools that is not related to this feature. Given the already substantial amount of modifications there that are related to this feature, it should probably be treated separately.
Thanks, Mathieu
Cheers Suzuki
Configure timestamps to be emitted at regular intervals in the trace stream to temporally correlate instructions executed on different CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etm4x.c | 109 +++++++++++++++++- 1 file changed, 106 insertions(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f067f7ee08c7..2c29d6743fee 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -138,8 +138,11 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) drvdata->base + TRCCNTVRn(i)); }
- /* Resource selector pair 0 is always implemented and reserved */ - for (i = 0; i < drvdata->nr_resource * 2; i++) + /* + * Resource selector pair 0 is always implemented and reserved. As + * such start at 2. + */ + for (i = 2; i < drvdata->nr_resource * 2; i++) writel_relaxed(config->res_ctrl[i], drvdata->base + TRCRSCTLRn(i));
@@ -201,6 +204,91 @@ static void etm4_enable_hw_smp_call(void *info) arg->rc = etm4_enable_hw(arg->drvdata); }
+/* + * The goal of function etm4_config_timestamp_event() is to configure a + * counter that will tell the tracer to emit a timestamp packet when it + * reaches zero. This is done in order to get a more fine grained idea + * of when instructions are executed so that they can be correlated + * with execution on other CPUs. + * + * To do this the counter itself is configured to self reload and + * TRCRSCTLR1 (always true) used to get the counter to decrement. From + * there a resource selector is configured with the counter and the + * timestamp control register to use the resource selector to trigger the + * event that will insert a timestamp packet in the stream. + */ +static int etm4_config_timestamp_event(struct etmv4_drvdata *drvdata) +{ + int i, ret = -EINVAL; + int counter, rselector; + u32 val = 0; + struct etmv4_config *config = &drvdata->config; + + /* No point in trying if we don't have at least one counter */ + if (!drvdata->nr_cntr) + goto out; + + /* Find a counter that hasn't been initialised */ + for (i = 0; i < drvdata->nr_cntr; i++) + if (config->cntr_val[i] == 0) + break; + + /* Remember what counter we used */ + counter = 1 << i; + + /* All the counters have been configured already, bail out */ + if (i == drvdata->nr_cntr) { + pr_err("%s: no available counter found\n", __func__); + goto out; + } + + /* + * Initialise original and reload counter value to the smallest + * possible value in order to get as much precision as we can. + */ + config->cntr_val[i] = 1; + config->cntrldvr[i] = 1; + + /* Set the trace counter control register */ + val = 0x1 << 16 | /* Bit 16, reload counter automatically */ + 0x0 << 7 | /* Select single resource selector */ + 0x1; /* Resource selector 1, i.e always true */ + + config->cntr_ctrl[i] = val; + + /* + * Searching for an available resource selector to use, starting at + * '2' since every implementation has at least 2 resource selector. + * ETMIDR4 gives the number of resource selector _pairs_, + * hence multiply by 2. + */ + for (i = 2; i < drvdata->nr_resource * 2; i++) + if (!config->res_ctrl[i]) + break; + + /* Remember what resource selector we used */ + rselector = i; + + if (i == drvdata->nr_resource * 2) { + pr_err("%s: no available resource selector found\n", __func__); + goto out; + } + + val = 0x2 << 16 | /* Group 0b0010 - Counter and sequencers */ + counter << 0; /* Counter to use */ + + config->res_ctrl[i] = val; + + val = 0x0 << 7 | /* Select single resource selector */ + rselector; /* Resource selector */ + + config->ts_ctrl = val; + + ret = 0; +out: + return ret; +} + static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, struct perf_event *event) { @@ -236,9 +324,24 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, /* TRM: Must program this for cycacc to work */ config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT; } - if (attr->config & BIT(ETM_OPT_TS)) + if (attr->config & BIT(ETM_OPT_TS)) { + /* + * Configure timestamps to be emitted at regular intervals in + * order to correlate instructions executed on different CPUs + * (CPU-wide trace scenarios). + */ + ret = etm4_config_timestamp_event(drvdata); + + /* + * No need to go further if timestamp intervals can't + * be configured. + */ + if (ret) + goto out; + /* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11); + }
if (attr->config & BIT(ETM_OPT_CTXTID)) /* bit[6], Context ID tracing bit */
On Wed, 6 Mar 2019 at 22:58, Mathieu Poirier mathieu.poirier@linaro.org wrote:
Configure timestamps to be emitted at regular intervals in the trace stream to temporally correlate instructions executed on different CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etm4x.c | 109 +++++++++++++++++- 1 file changed, 106 insertions(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f067f7ee08c7..2c29d6743fee 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -138,8 +138,11 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) drvdata->base + TRCCNTVRn(i)); }
/* Resource selector pair 0 is always implemented and reserved */
for (i = 0; i < drvdata->nr_resource * 2; i++)
/*
* Resource selector pair 0 is always implemented and reserved. As
* such start at 2.
*/
for (i = 2; i < drvdata->nr_resource * 2; i++) writel_relaxed(config->res_ctrl[i], drvdata->base + TRCRSCTLRn(i));
@@ -201,6 +204,91 @@ static void etm4_enable_hw_smp_call(void *info) arg->rc = etm4_enable_hw(arg->drvdata); }
+/*
- The goal of function etm4_config_timestamp_event() is to configure a
- counter that will tell the tracer to emit a timestamp packet when it
- reaches zero. This is done in order to get a more fine grained idea
- of when instructions are executed so that they can be correlated
- with execution on other CPUs.
- To do this the counter itself is configured to self reload and
- TRCRSCTLR1 (always true) used to get the counter to decrement. From
- there a resource selector is configured with the counter and the
- timestamp control register to use the resource selector to trigger the
- event that will insert a timestamp packet in the stream.
- */
+static int etm4_config_timestamp_event(struct etmv4_drvdata *drvdata) +{
int i, ret = -EINVAL;
int counter, rselector;
u32 val = 0;
struct etmv4_config *config = &drvdata->config;
/* No point in trying if we don't have at least one counter */
if (!drvdata->nr_cntr)
goto out;
/* Find a counter that hasn't been initialised */
for (i = 0; i < drvdata->nr_cntr; i++)
if (config->cntr_val[i] == 0)
break;
/* Remember what counter we used */
counter = 1 << i;
/* All the counters have been configured already, bail out */
if (i == drvdata->nr_cntr) {
pr_err("%s: no available counter found\n", __func__);
goto out;
}
/*
* Initialise original and reload counter value to the smallest
* possible value in order to get as much precision as we can.
*/
config->cntr_val[i] = 1;
config->cntrldvr[i] = 1;
Careful here - need to ensure that such a small gap between TS does not flood the output with timestamps and reduce the amount of usable trace. ETMv4 will delay outputting TS if doing so will cause an overflow - so may reduce the effective TS frequency anyway. Once we are happy the mechanism is right, then may need to play with these figures to get the right balance between precision and quality.
Mike
/* Set the trace counter control register */
val = 0x1 << 16 | /* Bit 16, reload counter automatically */
0x0 << 7 | /* Select single resource selector */
0x1; /* Resource selector 1, i.e always true */
config->cntr_ctrl[i] = val;
/*
* Searching for an available resource selector to use, starting at
* '2' since every implementation has at least 2 resource selector.
* ETMIDR4 gives the number of resource selector _pairs_,
* hence multiply by 2.
*/
for (i = 2; i < drvdata->nr_resource * 2; i++)
if (!config->res_ctrl[i])
break;
/* Remember what resource selector we used */
rselector = i;
if (i == drvdata->nr_resource * 2) {
pr_err("%s: no available resource selector found\n", __func__);
goto out;
}
val = 0x2 << 16 | /* Group 0b0010 - Counter and sequencers */
counter << 0; /* Counter to use */
config->res_ctrl[i] = val;
val = 0x0 << 7 | /* Select single resource selector */
rselector; /* Resource selector */
config->ts_ctrl = val;
ret = 0;
+out:
return ret;
+}
static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, struct perf_event *event) { @@ -236,9 +324,24 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, /* TRM: Must program this for cycacc to work */ config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT; }
if (attr->config & BIT(ETM_OPT_TS))
if (attr->config & BIT(ETM_OPT_TS)) {
/*
* Configure timestamps to be emitted at regular intervals in
* order to correlate instructions executed on different CPUs
* (CPU-wide trace scenarios).
*/
ret = etm4_config_timestamp_event(drvdata);
/*
* No need to go further if timestamp intervals can't
* be configured.
*/
if (ret)
goto out;
/* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11);
} if (attr->config & BIT(ETM_OPT_CTXTID)) /* bit[6], Context ID tracing bit */
-- 2.17.1
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
On Fri, 8 Mar 2019 at 06:54, Mike Leach mike.leach@linaro.org wrote:
On Wed, 6 Mar 2019 at 22:58, Mathieu Poirier mathieu.poirier@linaro.org wrote:
Configure timestamps to be emitted at regular intervals in the trace stream to temporally correlate instructions executed on different CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etm4x.c | 109 +++++++++++++++++- 1 file changed, 106 insertions(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f067f7ee08c7..2c29d6743fee 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -138,8 +138,11 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) drvdata->base + TRCCNTVRn(i)); }
/* Resource selector pair 0 is always implemented and reserved */
for (i = 0; i < drvdata->nr_resource * 2; i++)
/*
* Resource selector pair 0 is always implemented and reserved. As
* such start at 2.
*/
for (i = 2; i < drvdata->nr_resource * 2; i++) writel_relaxed(config->res_ctrl[i], drvdata->base + TRCRSCTLRn(i));
@@ -201,6 +204,91 @@ static void etm4_enable_hw_smp_call(void *info) arg->rc = etm4_enable_hw(arg->drvdata); }
+/*
- The goal of function etm4_config_timestamp_event() is to configure a
- counter that will tell the tracer to emit a timestamp packet when it
- reaches zero. This is done in order to get a more fine grained idea
- of when instructions are executed so that they can be correlated
- with execution on other CPUs.
- To do this the counter itself is configured to self reload and
- TRCRSCTLR1 (always true) used to get the counter to decrement. From
- there a resource selector is configured with the counter and the
- timestamp control register to use the resource selector to trigger the
- event that will insert a timestamp packet in the stream.
- */
+static int etm4_config_timestamp_event(struct etmv4_drvdata *drvdata) +{
int i, ret = -EINVAL;
int counter, rselector;
u32 val = 0;
struct etmv4_config *config = &drvdata->config;
/* No point in trying if we don't have at least one counter */
if (!drvdata->nr_cntr)
goto out;
/* Find a counter that hasn't been initialised */
for (i = 0; i < drvdata->nr_cntr; i++)
if (config->cntr_val[i] == 0)
break;
/* Remember what counter we used */
counter = 1 << i;
/* All the counters have been configured already, bail out */
if (i == drvdata->nr_cntr) {
pr_err("%s: no available counter found\n", __func__);
goto out;
}
/*
* Initialise original and reload counter value to the smallest
* possible value in order to get as much precision as we can.
*/
config->cntr_val[i] = 1;
config->cntrldvr[i] = 1;
Careful here - need to ensure that such a small gap between TS does not flood the output with timestamps and reduce the amount of usable trace. ETMv4 will delay outputting TS if doing so will cause an overflow - so may reduce the effective TS frequency anyway. Once we are happy the mechanism is right, then may need to play with these figures to get the right balance between precision and quality.
I toyed with these values for a while... From the testing I did on my side the above values yield one timestamp every 2 or 3 range packets when the system is not busy. When CPUs are 100% occupied there can be several hundreds of range packet generated between timestamps, hence going for the absolute minimum. That being said I'm definitely open to adopting a different configuration if we find something better.
Thanks, Mathieu
Mike
/* Set the trace counter control register */
val = 0x1 << 16 | /* Bit 16, reload counter automatically */
0x0 << 7 | /* Select single resource selector */
0x1; /* Resource selector 1, i.e always true */
config->cntr_ctrl[i] = val;
/*
* Searching for an available resource selector to use, starting at
* '2' since every implementation has at least 2 resource selector.
* ETMIDR4 gives the number of resource selector _pairs_,
* hence multiply by 2.
*/
for (i = 2; i < drvdata->nr_resource * 2; i++)
if (!config->res_ctrl[i])
break;
/* Remember what resource selector we used */
rselector = i;
if (i == drvdata->nr_resource * 2) {
pr_err("%s: no available resource selector found\n", __func__);
goto out;
}
val = 0x2 << 16 | /* Group 0b0010 - Counter and sequencers */
counter << 0; /* Counter to use */
config->res_ctrl[i] = val;
val = 0x0 << 7 | /* Select single resource selector */
rselector; /* Resource selector */
config->ts_ctrl = val;
ret = 0;
+out:
return ret;
+}
static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, struct perf_event *event) { @@ -236,9 +324,24 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, /* TRM: Must program this for cycacc to work */ config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT; }
if (attr->config & BIT(ETM_OPT_TS))
if (attr->config & BIT(ETM_OPT_TS)) {
/*
* Configure timestamps to be emitted at regular intervals in
* order to correlate instructions executed on different CPUs
* (CPU-wide trace scenarios).
*/
ret = etm4_config_timestamp_event(drvdata);
/*
* No need to go further if timestamp intervals can't
* be configured.
*/
if (ret)
goto out;
/* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11);
} if (attr->config & BIT(ETM_OPT_CTXTID)) /* bit[6], Context ID tracing bit */
-- 2.17.1
CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK
On Wed, Mar 06, 2019 at 03:57:35PM -0700, Mathieu Poirier wrote:
Configure timestamps to be emitted at regular intervals in the trace stream to temporally correlate instructions executed on different CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etm4x.c | 109 +++++++++++++++++- 1 file changed, 106 insertions(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f067f7ee08c7..2c29d6743fee 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -138,8 +138,11 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) drvdata->base + TRCCNTVRn(i)); }
- /* Resource selector pair 0 is always implemented and reserved */
- for (i = 0; i < drvdata->nr_resource * 2; i++)
- /*
* Resource selector pair 0 is always implemented and reserved. As
* such start at 2.
*/
- for (i = 2; i < drvdata->nr_resource * 2; i++) writel_relaxed(config->res_ctrl[i], drvdata->base + TRCRSCTLRn(i));
@@ -201,6 +204,91 @@ static void etm4_enable_hw_smp_call(void *info) arg->rc = etm4_enable_hw(arg->drvdata); } +/*
- The goal of function etm4_config_timestamp_event() is to configure a
- counter that will tell the tracer to emit a timestamp packet when it
- reaches zero. This is done in order to get a more fine grained idea
- of when instructions are executed so that they can be correlated
- with execution on other CPUs.
- To do this the counter itself is configured to self reload and
- TRCRSCTLR1 (always true) used to get the counter to decrement. From
- there a resource selector is configured with the counter and the
- timestamp control register to use the resource selector to trigger the
- event that will insert a timestamp packet in the stream.
- */
+static int etm4_config_timestamp_event(struct etmv4_drvdata *drvdata) +{
- int i, ret = -EINVAL;
- int counter, rselector;
- u32 val = 0;
- struct etmv4_config *config = &drvdata->config;
- /* No point in trying if we don't have at least one counter */
- if (!drvdata->nr_cntr)
goto out;
- /* Find a counter that hasn't been initialised */
- for (i = 0; i < drvdata->nr_cntr; i++)
if (config->cntr_val[i] == 0)
break;
- /* Remember what counter we used */
- counter = 1 << i;
- /* All the counters have been configured already, bail out */
- if (i == drvdata->nr_cntr) {
pr_err("%s: no available counter found\n", __func__);
goto out;
- }
- /*
* Initialise original and reload counter value to the smallest
* possible value in order to get as much precision as we can.
*/
- config->cntr_val[i] = 1;
- config->cntrldvr[i] = 1;
Maybe this variable is defined by old code, but show use the more consistent naming like: config->cntr_ldvr?
- /* Set the trace counter control register */
- val = 0x1 << 16 | /* Bit 16, reload counter automatically */
0x0 << 7 | /* Select single resource selector */
0x1; /* Resource selector 1, i.e always true */
- config->cntr_ctrl[i] = val;
- /*
* Searching for an available resource selector to use, starting at
* '2' since every implementation has at least 2 resource selector.
* ETMIDR4 gives the number of resource selector _pairs_,
* hence multiply by 2.
*/
- for (i = 2; i < drvdata->nr_resource * 2; i++)
if (!config->res_ctrl[i])
break;
- /* Remember what resource selector we used */
- rselector = i;
- if (i == drvdata->nr_resource * 2) {
pr_err("%s: no available resource selector found\n", __func__);
goto out;
- }
If there ahve no available resource selector and 'goto' out tag, it will not reset the values for counter registers. Do you think for the failure case, we should clean below fields:
config->cntr_val[i] = 0; config->cntrldvr[i] = 0; config->cntr_ctrl[i] = 0;
- val = 0x2 << 16 | /* Group 0b0010 - Counter and sequencers */
counter << 0; /* Counter to use */
When read the ETMv4 spec, the counter value are only set in bits 0 to 3, so should reserve for other fields?
- config->res_ctrl[i] = val;
- val = 0x0 << 7 | /* Select single resource selector */
rselector; /* Resource selector */
- config->ts_ctrl = val;
- ret = 0;
+out:
- return ret;
+}
static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, struct perf_event *event) { @@ -236,9 +324,24 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, /* TRM: Must program this for cycacc to work */ config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT; }
- if (attr->config & BIT(ETM_OPT_TS))
- if (attr->config & BIT(ETM_OPT_TS)) {
/*
* Configure timestamps to be emitted at regular intervals in
* order to correlate instructions executed on different CPUs
* (CPU-wide trace scenarios).
*/
ret = etm4_config_timestamp_event(drvdata);
Use this method, I can see now it only allocates counter but there have no flow to release counter; so after run for several times with perf command, will it fail to allocate any counter and resource selector?
/*
* No need to go further if timestamp intervals can't
* be configured.
*/
if (ret)
goto out;
- /* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11);
- }
if (attr->config & BIT(ETM_OPT_CTXTID)) /* bit[6], Context ID tracing bit */ -- 2.17.1
Hey Leo,
On Sun, 17 Mar 2019 at 00:30, Leo Yan leo.yan@linaro.org wrote:
On Wed, Mar 06, 2019 at 03:57:35PM -0700, Mathieu Poirier wrote:
Configure timestamps to be emitted at regular intervals in the trace stream to temporally correlate instructions executed on different CPUs.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etm4x.c | 109 +++++++++++++++++- 1 file changed, 106 insertions(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f067f7ee08c7..2c29d6743fee 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -138,8 +138,11 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) drvdata->base + TRCCNTVRn(i)); }
/* Resource selector pair 0 is always implemented and reserved */
for (i = 0; i < drvdata->nr_resource * 2; i++)
/*
* Resource selector pair 0 is always implemented and reserved. As
* such start at 2.
*/
for (i = 2; i < drvdata->nr_resource * 2; i++) writel_relaxed(config->res_ctrl[i], drvdata->base + TRCRSCTLRn(i));
@@ -201,6 +204,91 @@ static void etm4_enable_hw_smp_call(void *info) arg->rc = etm4_enable_hw(arg->drvdata); }
+/*
- The goal of function etm4_config_timestamp_event() is to configure a
- counter that will tell the tracer to emit a timestamp packet when it
- reaches zero. This is done in order to get a more fine grained idea
- of when instructions are executed so that they can be correlated
- with execution on other CPUs.
- To do this the counter itself is configured to self reload and
- TRCRSCTLR1 (always true) used to get the counter to decrement. From
- there a resource selector is configured with the counter and the
- timestamp control register to use the resource selector to trigger the
- event that will insert a timestamp packet in the stream.
- */
+static int etm4_config_timestamp_event(struct etmv4_drvdata *drvdata) +{
int i, ret = -EINVAL;
int counter, rselector;
u32 val = 0;
struct etmv4_config *config = &drvdata->config;
/* No point in trying if we don't have at least one counter */
if (!drvdata->nr_cntr)
goto out;
/* Find a counter that hasn't been initialised */
for (i = 0; i < drvdata->nr_cntr; i++)
if (config->cntr_val[i] == 0)
break;
/* Remember what counter we used */
counter = 1 << i;
/* All the counters have been configured already, bail out */
if (i == drvdata->nr_cntr) {
pr_err("%s: no available counter found\n", __func__);
goto out;
}
/*
* Initialise original and reload counter value to the smallest
* possible value in order to get as much precision as we can.
*/
config->cntr_val[i] = 1;
config->cntrldvr[i] = 1;
Maybe this variable is defined by old code, but show use the more consistent naming like: config->cntr_ldvr?
Perhaps yes... But it would require a patch on its own, something that is at the very bottom of the priority list.
/* Set the trace counter control register */
val = 0x1 << 16 | /* Bit 16, reload counter automatically */
0x0 << 7 | /* Select single resource selector */
0x1; /* Resource selector 1, i.e always true */
config->cntr_ctrl[i] = val;
/*
* Searching for an available resource selector to use, starting at
* '2' since every implementation has at least 2 resource selector.
* ETMIDR4 gives the number of resource selector _pairs_,
* hence multiply by 2.
*/
for (i = 2; i < drvdata->nr_resource * 2; i++)
if (!config->res_ctrl[i])
break;
/* Remember what resource selector we used */
rselector = i;
if (i == drvdata->nr_resource * 2) {
pr_err("%s: no available resource selector found\n", __func__);
goto out;
}
If there ahve no available resource selector and 'goto' out tag, it will not reset the values for counter registers. Do you think for the failure case, we should clean below fields:
config->cntr_val[i] = 0; config->cntrldvr[i] = 0; config->cntr_ctrl[i] = 0;
You are correct - I will fix this.
val = 0x2 << 16 | /* Group 0b0010 - Counter and sequencers */
counter << 0; /* Counter to use */
When read the ETMv4 spec, the counter value are only set in bits 0 to 3, so should reserve for other fields?
This is a concern I looked into a while back. The other bits are labeled "RES0", which is defined in the TRM's glossary as:
"A reserved bit or field with Should-Be-Zero-or-Preserved (SBZP) behavior".
As such we should be fine with just writing '0' to the RES0 bits.
config->res_ctrl[i] = val;
val = 0x0 << 7 | /* Select single resource selector */
rselector; /* Resource selector */
config->ts_ctrl = val;
ret = 0;
+out:
return ret;
+}
static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, struct perf_event *event) { @@ -236,9 +324,24 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, /* TRM: Must program this for cycacc to work */ config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT; }
if (attr->config & BIT(ETM_OPT_TS))
if (attr->config & BIT(ETM_OPT_TS)) {
/*
* Configure timestamps to be emitted at regular intervals in
* order to correlate instructions executed on different CPUs
* (CPU-wide trace scenarios).
*/
ret = etm4_config_timestamp_event(drvdata);
Use this method, I can see now it only allocates counter but there have no flow to release counter; so after run for several times with perf command, will it fail to allocate any counter and resource selector?
Very well spotted. I will fix this.
/*
* No need to go further if timestamp intervals can't
* be configured.
*/
if (ret)
goto out;
/* bit[11], Global timestamp tracing bit */ config->cfg |= BIT(11);
} if (attr->config & BIT(ETM_OPT_CTXTID)) /* bit[6], Context ID tracing bit */
-- 2.17.1
{ @@ -236,9 +324,24 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, /* TRM: Must program this for cycacc to work */ config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT; }
if (attr->config & BIT(ETM_OPT_TS))
if (attr->config & BIT(ETM_OPT_TS)) {
/*
* Configure timestamps to be emitted at regular intervals in
* order to correlate instructions executed on different CPUs
* (CPU-wide trace scenarios).
*/
ret = etm4_config_timestamp_event(drvdata);
Use this method, I can see now it only allocates counter but there have no flow to release counter; so after run for several times with perf command, will it fail to allocate any counter and resource selector?
Very well spotted. I will fix this.
Turns out function etm4_parse_event_config() already clears out specifics from the previous session and starts with a clean slate.
Mathieu
[1]. https://elixir.bootlin.com/linux/v5.1-rc1/source/drivers/hwtracing/coresight...
On Mon, Mar 18, 2019 at 02:48:55PM -0600, Mathieu Poirier wrote:
{ @@ -236,9 +324,24 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata, /* TRM: Must program this for cycacc to work */ config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT; }
if (attr->config & BIT(ETM_OPT_TS))
if (attr->config & BIT(ETM_OPT_TS)) {
/*
* Configure timestamps to be emitted at regular intervals in
* order to correlate instructions executed on different CPUs
* (CPU-wide trace scenarios).
*/
ret = etm4_config_timestamp_event(drvdata);
Use this method, I can see now it only allocates counter but there have no flow to release counter; so after run for several times with perf command, will it fail to allocate any counter and resource selector?
Very well spotted. I will fix this.
Turns out function etm4_parse_event_config() already clears out specifics from the previous session and starts with a clean slate.
You are right, sorry for noise.
Thanks, Leo Yan
[1]. https://elixir.bootlin.com/linux/v5.1-rc1/source/drivers/hwtracing/coresight...
In preparation to handle device reference counting inside of the sink drivers, add a return code to the sink::disable() operation so that proper action can be taken if a sink has not been disabled.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etb10.c | 3 ++- drivers/hwtracing/coresight/coresight-tmc-etf.c | 5 +++-- drivers/hwtracing/coresight/coresight-tmc-etr.c | 5 +++-- drivers/hwtracing/coresight/coresight-tpiu.c | 3 ++- drivers/hwtracing/coresight/coresight.c | 6 +++++- include/linux/coresight.h | 2 +- 6 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 105782ea64c7..71c2a3cdb866 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -325,7 +325,7 @@ static void etb_disable_hw(struct etb_drvdata *drvdata) coresight_disclaim_device(drvdata->base); }
-static void etb_disable(struct coresight_device *csdev) +static int etb_disable(struct coresight_device *csdev) { struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); unsigned long flags; @@ -340,6 +340,7 @@ static void etb_disable(struct coresight_device *csdev) spin_unlock_irqrestore(&drvdata->spinlock, flags);
dev_dbg(drvdata->dev, "ETB disabled\n"); + return 0; }
static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu, diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index a5f053f2db2c..d4213e7c2c45 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -273,7 +273,7 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, return 0; }
-static void tmc_disable_etf_sink(struct coresight_device *csdev) +static int tmc_disable_etf_sink(struct coresight_device *csdev) { unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -281,7 +281,7 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags); - return; + return -EBUSY; }
/* Disable the TMC only if it needs to */ @@ -293,6 +293,7 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev) spin_unlock_irqrestore(&drvdata->spinlock, flags);
dev_dbg(drvdata->dev, "TMC-ETB/ETF disabled\n"); + return 0; }
static int tmc_enable_etf_link(struct coresight_device *csdev, diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index f684283890d3..33501777038a 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1392,7 +1392,7 @@ static int tmc_enable_etr_sink(struct coresight_device *csdev, return -EINVAL; }
-static void tmc_disable_etr_sink(struct coresight_device *csdev) +static int tmc_disable_etr_sink(struct coresight_device *csdev) { unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -1400,7 +1400,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags); - return; + return -EBUSY; }
/* Disable the TMC only if it needs to */ @@ -1412,6 +1412,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev) spin_unlock_irqrestore(&drvdata->spinlock, flags);
dev_dbg(drvdata->dev, "TMC-ETR disabled\n"); + return 0; }
static const struct coresight_ops_sink tmc_etr_sink_ops = { diff --git a/drivers/hwtracing/coresight/coresight-tpiu.c b/drivers/hwtracing/coresight/coresight-tpiu.c index b2f72a1fa402..0d13da1b9df1 100644 --- a/drivers/hwtracing/coresight/coresight-tpiu.c +++ b/drivers/hwtracing/coresight/coresight-tpiu.c @@ -94,13 +94,14 @@ static void tpiu_disable_hw(struct tpiu_drvdata *drvdata) CS_LOCK(drvdata->base); }
-static void tpiu_disable(struct coresight_device *csdev) +static int tpiu_disable(struct coresight_device *csdev) { struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
tpiu_disable_hw(drvdata);
dev_dbg(drvdata->dev, "TPIU disabled\n"); + return 0; }
static const struct coresight_ops_sink tpiu_sink_ops = { diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 29cef898afba..13eda4693f81 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -239,9 +239,13 @@ static int coresight_enable_sink(struct coresight_device *csdev,
static void coresight_disable_sink(struct coresight_device *csdev) { + int ret; + if (atomic_dec_return(csdev->refcnt) == 0) { if (sink_ops(csdev)->disable) { - sink_ops(csdev)->disable(csdev); + ret = sink_ops(csdev)->disable(csdev); + if (ret) + return; csdev->enable = false; } } diff --git a/include/linux/coresight.h b/include/linux/coresight.h index 7b87965f7a65..189cc6ddc92b 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -192,7 +192,7 @@ struct coresight_device { */ struct coresight_ops_sink { int (*enable)(struct coresight_device *csdev, u32 mode, void *data); - void (*disable)(struct coresight_device *csdev); + int (*disable)(struct coresight_device *csdev); void *(*alloc_buffer)(struct coresight_device *csdev, int cpu, void **pages, int nr_pages, bool overwrite); void (*free_buffer)(void *config);
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
In preparation to handle device reference counting inside of the sink drivers, add a return code to the sink::disable() operation so that proper action can be taken if a sink has not been disabled.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etb10.c | 3 ++- drivers/hwtracing/coresight/coresight-tmc-etf.c | 5 +++-- drivers/hwtracing/coresight/coresight-tmc-etr.c | 5 +++-- drivers/hwtracing/coresight/coresight-tpiu.c | 3 ++- drivers/hwtracing/coresight/coresight.c | 6 +++++- include/linux/coresight.h | 2 +- 6 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 105782ea64c7..71c2a3cdb866 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -325,7 +325,7 @@ static void etb_disable_hw(struct etb_drvdata *drvdata) coresight_disclaim_device(drvdata->base); } -static void etb_disable(struct coresight_device *csdev) +static int etb_disable(struct coresight_device *csdev) { struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); unsigned long flags; @@ -340,6 +340,7 @@ static void etb_disable(struct coresight_device *csdev) spin_unlock_irqrestore(&drvdata->spinlock, flags); dev_dbg(drvdata->dev, "ETB disabled\n");
- return 0; }
static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu, diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index a5f053f2db2c..d4213e7c2c45 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -273,7 +273,7 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, return 0; } -static void tmc_disable_etf_sink(struct coresight_device *csdev) +static int tmc_disable_etf_sink(struct coresight_device *csdev) { unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -281,7 +281,7 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags);
return;
return -EBUSY;
...
-static void tmc_disable_etr_sink(struct coresight_device *csdev) +static int tmc_disable_etr_sink(struct coresight_device *csdev) { unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -1400,7 +1400,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags);
return;
return -EBUSY;
This patch as such is fine. But I think we can improve the handling here.
I agree that these error codes are appropriate. However, I think we should still modify the mode to disabled for both the etf/etr to make sure the device is kept disabled after we finished reading. As the user has disabled it and thinks that it has completed successfully.
So, I think when we find that the sink buffer is being read, we may :
1) Set the mode to disabled, to allow the unprepare routine to keep this disabled.
2) Return 0, to indicate the operation will be acknowledged and no need to redo the operation.
What do you think ?
Suzuki
Good day Suzuki,
On Thu, 14 Mar 2019 at 06:17, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
In preparation to handle device reference counting inside of the sink drivers, add a return code to the sink::disable() operation so that proper action can be taken if a sink has not been disabled.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etb10.c | 3 ++- drivers/hwtracing/coresight/coresight-tmc-etf.c | 5 +++-- drivers/hwtracing/coresight/coresight-tmc-etr.c | 5 +++-- drivers/hwtracing/coresight/coresight-tpiu.c | 3 ++- drivers/hwtracing/coresight/coresight.c | 6 +++++- include/linux/coresight.h | 2 +- 6 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 105782ea64c7..71c2a3cdb866 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -325,7 +325,7 @@ static void etb_disable_hw(struct etb_drvdata *drvdata) coresight_disclaim_device(drvdata->base); }
-static void etb_disable(struct coresight_device *csdev) +static int etb_disable(struct coresight_device *csdev) { struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); unsigned long flags; @@ -340,6 +340,7 @@ static void etb_disable(struct coresight_device *csdev) spin_unlock_irqrestore(&drvdata->spinlock, flags);
dev_dbg(drvdata->dev, "ETB disabled\n");
return 0;
}
static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu,
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index a5f053f2db2c..d4213e7c2c45 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -273,7 +273,7 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, return 0; }
-static void tmc_disable_etf_sink(struct coresight_device *csdev) +static int tmc_disable_etf_sink(struct coresight_device *csdev) { unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -281,7 +281,7 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags);
return;
return -EBUSY;
...
-static void tmc_disable_etr_sink(struct coresight_device *csdev) +static int tmc_disable_etr_sink(struct coresight_device *csdev) { unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -1400,7 +1400,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags);
return;
return -EBUSY;
This patch as such is fine. But I think we can improve the handling here.
I agree that these error codes are appropriate. However, I think we should still modify the mode to disabled for both the etf/etr to make sure the device is kept disabled after we finished reading. As the user has disabled it and thinks that it has completed successfully.
So, I think when we find that the sink buffer is being read, we may :
- Set the mode to disabled, to allow the unprepare routine to keep this
disabled.
- Return 0, to indicate the operation will be acknowledged and no need
to redo the operation.
What do you think ?
We could do that but after looking at the ETR code (I haven't looked at ETF) the following comes to mind:
1) Disabling the ETR when a read is already happening means we call __tmc_etr_disable_hw() twice, that is once in tmc_read_prepare_etr() and another time in tmc_disable_etr_sink() as part of tmc_etr_disable_hw(). I looked at the code and this _should_ be fine. To do things properly we'd need to split tmc_etr_disable_hw() in two, hence creating 3 different levels of ETR disable, which I'm not a fan of.
2) Another problem I see is that by disabling the sink while a read operation is pending, we also set the mode to CS_MODE_DISABLED and decrement csdev->refcnt. That means another session (sysfs or perf) is free to use the sink and do all the required re-initialisation it needs. I think this should be fine but a lot more scrutiny would need to be invested if we are to go down that route. Thanks to you I found a bug in tmc_disable_etf_sink()[1] and tmc_disable_etr_sink()[2] - testing for drvdata->reading should be done before decrementing the refcount.
3) I've always seen sysfs operations being done consecutively, i.e one after the other. If a user is reading from a sink it is not expected they would issue a concurrent disable operation.
So long story short I think it is possible but I'm not sure it it worth the investment. I'm also not sure it should be rolled in this set. Anyways, your turn to let me know what you think.
Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/tree/drivers/hwt... [2]. https://git.linaro.org/people/mathieu.poirier/coresight.git/tree/drivers/hwt...
Suzuki
Hi Mathieu,
-static void tmc_disable_etr_sink(struct coresight_device *csdev) +static int tmc_disable_etr_sink(struct coresight_device *csdev) { unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -1400,7 +1400,7 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags);
return;
return -EBUSY;
This patch as such is fine. But I think we can improve the handling here.
I agree that these error codes are appropriate. However, I think we should still modify the mode to disabled for both the etf/etr to make sure the device is kept disabled after we finished reading. As the user has disabled it and thinks that it has completed successfully.
So, I think when we find that the sink buffer is being read, we may :
- Set the mode to disabled, to allow the unprepare routine to keep this
disabled.
- Return 0, to indicate the operation will be acknowledged and no need
to redo the operation.
What do you think ?
We could do that but after looking at the ETR code (I haven't looked at ETF) the following comes to mind:
- Disabling the ETR when a read is already happening means we call
__tmc_etr_disable_hw() twice, that is once in tmc_read_prepare_etr() and another time in tmc_disable_etr_sink() as part of tmc_etr_disable_hw(). I looked at the code and this _should_ be fine. To do things properly we'd need to split tmc_etr_disable_hw() in two, hence creating 3 different levels of ETR disable, which I'm not a fan of.
So the point is, if it is being read, we simply set the mode to DISABLED and return success here. That implies, we don't do the __tmc_etr_disable_hw() as the hw is already disabled. Now, the read_unprepare_etr() could leave the ETR disabled, since the mode has been changed already.
- Another problem I see is that by disabling the sink while a read
operation is pending, we also set the mode to CS_MODE_DISABLED and decrement csdev->refcnt. That means another session (sysfs or perf) is free to use the sink and do all the required re-initialisation it needs. I think this should be fine but a lot more scrutiny would need to be invested if we are to go down that route. Thanks to you I found a bug in tmc_disable_etf_sink()[1] and tmc_disable_etr_sink()[2] - testing for drvdata->reading should be done before decrementing the refcount.
But then we do check if it is being read before we allow enabling the hardware. So, we are covered. If we don't, we must do that if we are going to enable PERF mode.
- I've always seen sysfs operations being done consecutively, i.e one
after the other. If a user is reading from a sink it is not expected they would issue a concurrent disable operation.
So long story short I think it is possible but I'm not sure it it worth the investment. I'm also not sure it should be rolled in this set. Anyways, your turn to let me know what you think.
I am fine with this being delayed into separate patch.
Cheers Suzuki
When operating in CPU-wide mode with an N:1 source/sink HW topology, multiple CPUs can access a sink concurrently. As such reference counting needs to happen when the device's spinlock is held to avoid racing with other operations (start(), update(), stop()).
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etb10.c | 18 ++++++++++-- .../hwtracing/coresight/coresight-tmc-etf.c | 21 +++++++++++--- .../hwtracing/coresight/coresight-tmc-etr.c | 19 +++++++++++-- drivers/hwtracing/coresight/coresight-tpiu.c | 6 +++- drivers/hwtracing/coresight/coresight.c | 28 +++++++++---------- 5 files changed, 66 insertions(+), 26 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 71c2a3cdb866..dcc550507f70 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -5,6 +5,7 @@ * Description: CoreSight Embedded Trace Buffer driver */
+#include <linux/atomic.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/types.h> @@ -160,12 +161,16 @@ static int etb_enable_sysfs(struct coresight_device *csdev) }
/* Nothing to do, the tracer is already enabled. */ - if (drvdata->mode == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) { + atomic_inc(csdev->refcnt); goto out; + }
ret = etb_enable_hw(drvdata); - if (!ret) + if (!ret) { drvdata->mode = CS_MODE_SYSFS; + atomic_inc(csdev->refcnt); + }
out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -196,8 +201,10 @@ static int etb_enable_perf(struct coresight_device *csdev, void *data) goto out;
ret = etb_enable_hw(drvdata); - if (!ret) + if (!ret) { drvdata->mode = CS_MODE_PERF; + atomic_inc(csdev->refcnt); + }
out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -332,6 +339,11 @@ static int etb_disable(struct coresight_device *csdev)
spin_lock_irqsave(&drvdata->spinlock, flags);
+ if (atomic_dec_return(csdev->refcnt)) { + spin_unlock_irqrestore(&drvdata->spinlock, flags); + return -EBUSY; + } + /* Disable the ETB only if it needs to */ if (drvdata->mode != CS_MODE_DISABLED) { etb_disable_hw(drvdata); diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index d4213e7c2c45..b6d0c010eec1 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -4,6 +4,7 @@ * Author: Mathieu Poirier mathieu.poirier@linaro.org */
+#include <linux/atomic.h> #include <linux/circ_buf.h> #include <linux/coresight.h> #include <linux/perf_event.h> @@ -180,8 +181,10 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev) * sink is already enabled no memory is needed and the HW need not be * touched. */ - if (drvdata->mode == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) { + atomic_inc(csdev->refcnt); goto out; + }
/* * If drvdata::buf isn't NULL, memory was allocated for a previous @@ -200,11 +203,13 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev) }
ret = tmc_etb_enable_hw(drvdata); - if (!ret) + if (!ret) { drvdata->mode = CS_MODE_SYSFS; - else + atomic_inc(csdev->refcnt); + } else { /* Free up the buffer if we failed to enable */ used = false; + } out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -239,8 +244,10 @@ static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, void *data) if (ret) break; ret = tmc_etb_enable_hw(drvdata); - if (!ret) + if (!ret) { drvdata->mode = CS_MODE_PERF; + atomic_inc(csdev->refcnt); + } } while (0); spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -279,6 +286,12 @@ static int tmc_disable_etf_sink(struct coresight_device *csdev) struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
spin_lock_irqsave(&drvdata->spinlock, flags); + + if (atomic_dec_return(csdev->refcnt)) { + spin_unlock_irqrestore(&drvdata->spinlock, flags); + return -EBUSY; + } + if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags); return -EBUSY; diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 33501777038a..cbb17b2c53b5 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -4,6 +4,7 @@ * Author: Mathieu Poirier mathieu.poirier@linaro.org */
+#include <linux/atomic.h> #include <linux/coresight.h> #include <linux/dma-mapping.h> #include <linux/iommu.h> @@ -1124,8 +1125,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) * sink is already enabled no memory is needed and the HW need not be * touched, even if the buffer size has changed. */ - if (drvdata->mode == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) { + atomic_inc(csdev->refcnt); goto out; + }
/* * If we don't have a buffer or it doesn't match the requested size, @@ -1138,8 +1141,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) }
ret = tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf); - if (!ret) + if (!ret) { drvdata->mode = CS_MODE_SYSFS; + atomic_inc(csdev->refcnt); + } out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -1370,8 +1375,10 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf); drvdata->perf_data = etr_perf; rc = tmc_etr_enable_hw(drvdata, etr_perf->etr_buf); - if (!rc) + if (!rc) { drvdata->mode = CS_MODE_PERF; + atomic_inc(csdev->refcnt); + }
unlock_out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -1398,6 +1405,12 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev) struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
spin_lock_irqsave(&drvdata->spinlock, flags); + + if (atomic_dec_return(csdev->refcnt)) { + spin_unlock_irqrestore(&drvdata->spinlock, flags); + return -EBUSY; + } + if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags); return -EBUSY; diff --git a/drivers/hwtracing/coresight/coresight-tpiu.c b/drivers/hwtracing/coresight/coresight-tpiu.c index 0d13da1b9df1..7acbeffcc137 100644 --- a/drivers/hwtracing/coresight/coresight-tpiu.c +++ b/drivers/hwtracing/coresight/coresight-tpiu.c @@ -5,6 +5,7 @@ * Description: CoreSight Trace Port Interface Unit driver */
+#include <linux/atomic.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/device.h> @@ -73,7 +74,7 @@ static int tpiu_enable(struct coresight_device *csdev, u32 mode, void *__unused) struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
tpiu_enable_hw(drvdata); - + atomic_inc(csdev->refcnt); dev_dbg(drvdata->dev, "TPIU enabled\n"); return 0; } @@ -98,6 +99,9 @@ static int tpiu_disable(struct coresight_device *csdev) { struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+ if (atomic_dec_return(csdev->refcnt)) + return -EBUSY; + tpiu_disable_hw(drvdata);
dev_dbg(drvdata->dev, "TPIU disabled\n"); diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 13eda4693f81..19ba121d7451 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -225,14 +225,13 @@ static int coresight_enable_sink(struct coresight_device *csdev, * We need to make sure the "new" session is compatible with the * existing "mode" of operation. */ - if (sink_ops(csdev)->enable) { - ret = sink_ops(csdev)->enable(csdev, mode, data); - if (ret) - return ret; - csdev->enable = true; - } + if (!sink_ops(csdev)->enable) + return -EINVAL;
- atomic_inc(csdev->refcnt); + ret = sink_ops(csdev)->enable(csdev, mode, data); + if (ret) + return ret; + csdev->enable = true;
return 0; } @@ -241,14 +240,13 @@ static void coresight_disable_sink(struct coresight_device *csdev) { int ret;
- if (atomic_dec_return(csdev->refcnt) == 0) { - if (sink_ops(csdev)->disable) { - ret = sink_ops(csdev)->disable(csdev); - if (ret) - return; - csdev->enable = false; - } - } + if (!sink_ops(csdev)->disable) + return; + + ret = sink_ops(csdev)->disable(csdev); + if (ret) + return; + csdev->enable = false; }
static int coresight_enable_link(struct coresight_device *csdev,
On 06/03/2019 22:57, Mathieu Poirier wrote:
When operating in CPU-wide mode with an N:1 source/sink HW topology, multiple CPUs can access a sink concurrently. As such reference counting needs to happen when the device's spinlock is held to avoid racing with other operations (start(), update(), stop()).
I think this patch is trying to address the following issue. i.e, we get the refcount a bit late, after we have enabled the sink, and that could make use loose some sessions, if another session drops the refcount before the another one gets it.
i.e,
Session A Session B ----- -------
enable_sink atomic_inc(refcount) = 1
... atomic_dec(refcount) = 0 enable_sink if (refcount == 0) disable_sink atomic_inc() could we have this in the commit description.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etb10.c | 18 ++++++++++-- .../hwtracing/coresight/coresight-tmc-etf.c | 21 +++++++++++--- .../hwtracing/coresight/coresight-tmc-etr.c | 19 +++++++++++-- drivers/hwtracing/coresight/coresight-tpiu.c | 6 +++- drivers/hwtracing/coresight/coresight.c | 28 +++++++++---------- 5 files changed, 66 insertions(+), 26 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 71c2a3cdb866..dcc550507f70 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -5,6 +5,7 @@
- Description: CoreSight Embedded Trace Buffer driver
*/ +#include <linux/atomic.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/types.h> @@ -160,12 +161,16 @@ static int etb_enable_sysfs(struct coresight_device *csdev) } /* Nothing to do, the tracer is already enabled. */
- if (drvdata->mode == CS_MODE_SYSFS)
- if (drvdata->mode == CS_MODE_SYSFS) {
goto out;atomic_inc(csdev->refcnt);
- }
ret = etb_enable_hw(drvdata);
- if (!ret)
- if (!ret) { drvdata->mode = CS_MODE_SYSFS;
atomic_inc(csdev->refcnt);
- }
out:
Could we consolidate the above two cases and move it here at out ?;
if (!ret) atomic_inc(csdev->refcnt);
spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -196,8 +201,10 @@ static int etb_enable_perf(struct coresight_device *csdev, void *data) goto out; ret = etb_enable_hw(drvdata);
- if (!ret)
- if (!ret) { drvdata->mode = CS_MODE_PERF;
atomic_inc(csdev->refcnt);
- }
out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -332,6 +339,11 @@ static int etb_disable(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags);
- if (atomic_dec_return(csdev->refcnt)) {
spin_unlock_irqrestore(&drvdata->spinlock, flags);
return -EBUSY;
- }
- /* Disable the ETB only if it needs to */ if (drvdata->mode != CS_MODE_DISABLED) { etb_disable_hw(drvdata);
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index d4213e7c2c45..b6d0c010eec1 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -4,6 +4,7 @@
- Author: Mathieu Poirier mathieu.poirier@linaro.org
*/
..
/* * If drvdata::buf isn't NULL, memory was allocated for a previous @@ -200,11 +203,13 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev) } ret = tmc_etb_enable_hw(drvdata);
- if (!ret)
- if (!ret) { drvdata->mode = CS_MODE_SYSFS;
- else
atomic_inc(csdev->refcnt);
- } else { /* Free up the buffer if we failed to enable */ used = false;
- } out:
Same as above here.
spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -239,8 +244,10 @@ static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, void *data) if (ret) break; ret = tmc_etb_enable_hw(drvdata);
if (!ret)
if (!ret) { drvdata->mode = CS_MODE_PERF;
atomic_inc(csdev->refcnt);
} while (0); spin_unlock_irqrestore(&drvdata->spinlock, flags);}
@@ -279,6 +286,12 @@ static int tmc_disable_etf_sink(struct coresight_device *csdev) struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); spin_lock_irqsave(&drvdata->spinlock, flags);
- if (atomic_dec_return(csdev->refcnt)) {
spin_unlock_irqrestore(&drvdata->spinlock, flags);
return -EBUSY;
- }
- if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags); return -EBUSY;
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 33501777038a..cbb17b2c53b5 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -4,6 +4,7 @@
- Author: Mathieu Poirier mathieu.poirier@linaro.org
*/ +#include <linux/atomic.h> #include <linux/coresight.h> #include <linux/dma-mapping.h> #include <linux/iommu.h> @@ -1124,8 +1125,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) * sink is already enabled no memory is needed and the HW need not be * touched, even if the buffer size has changed. */
- if (drvdata->mode == CS_MODE_SYSFS)
- if (drvdata->mode == CS_MODE_SYSFS) {
goto out;atomic_inc(csdev->refcnt);
- }
/* * If we don't have a buffer or it doesn't match the requested size, @@ -1138,8 +1141,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) } ret = tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
- if (!ret)
- if (!ret) { drvdata->mode = CS_MODE_SYSFS;
atomic_inc(csdev->refcnt);
- } out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -1370,8 +1375,10 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf); drvdata->perf_data = etr_perf; rc = tmc_etr_enable_hw(drvdata, etr_perf->etr_buf);
- if (!rc)
- if (!rc) { drvdata->mode = CS_MODE_PERF;
atomic_inc(csdev->refcnt);
- }
unlock_out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -1398,6 +1405,12 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev) struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); spin_lock_irqsave(&drvdata->spinlock, flags);
- if (atomic_dec_return(csdev->refcnt)) {
spin_unlock_irqrestore(&drvdata->spinlock, flags);
return -EBUSY;
- }
- if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags); return -EBUSY;
diff --git a/drivers/hwtracing/coresight/coresight-tpiu.c b/drivers/hwtracing/coresight/coresight-tpiu.c index 0d13da1b9df1..7acbeffcc137 100644 --- a/drivers/hwtracing/coresight/coresight-tpiu.c +++ b/drivers/hwtracing/coresight/coresight-tpiu.c @@ -5,6 +5,7 @@
- Description: CoreSight Trace Port Interface Unit driver
*/ +#include <linux/atomic.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/device.h> @@ -73,7 +74,7 @@ static int tpiu_enable(struct coresight_device *csdev, u32 mode, void *__unused) struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); tpiu_enable_hw(drvdata);
- atomic_inc(csdev->refcnt); dev_dbg(drvdata->dev, "TPIU enabled\n"); return 0; }
@@ -98,6 +99,9 @@ static int tpiu_disable(struct coresight_device *csdev) { struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- if (atomic_dec_return(csdev->refcnt))
return -EBUSY;
- tpiu_disable_hw(drvdata);
dev_dbg(drvdata->dev, "TPIU disabled\n"); diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 13eda4693f81..19ba121d7451 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -225,14 +225,13 @@ static int coresight_enable_sink(struct coresight_device *csdev, * We need to make sure the "new" session is compatible with the * existing "mode" of operation. */
- if (sink_ops(csdev)->enable) {
ret = sink_ops(csdev)->enable(csdev, mode, data);
if (ret)
return ret;
csdev->enable = true;
- }
- if (!sink_ops(csdev)->enable)
return -EINVAL;
- atomic_inc(csdev->refcnt);
- ret = sink_ops(csdev)->enable(csdev, mode, data);
- if (ret)
return ret;
- csdev->enable = true;
return 0; } @@ -241,14 +240,13 @@ static void coresight_disable_sink(struct coresight_device *csdev) { int ret;
- if (atomic_dec_return(csdev->refcnt) == 0) {
if (sink_ops(csdev)->disable) {
ret = sink_ops(csdev)->disable(csdev);
if (ret)
return;
csdev->enable = false;
}
- }
- if (!sink_ops(csdev)->disable)
return;
- ret = sink_ops(csdev)->disable(csdev);
- if (ret)
return;
- csdev->enable = false; }
static int coresight_enable_link(struct coresight_device *csdev,
Otherwise looks fine to me.
Suzuki
On Fri, Mar 15, 2019 at 04:57:13PM +0000, Suzuki K Poulose wrote:
On 06/03/2019 22:57, Mathieu Poirier wrote:
When operating in CPU-wide mode with an N:1 source/sink HW topology, multiple CPUs can access a sink concurrently. As such reference counting needs to happen when the device's spinlock is held to avoid racing with other operations (start(), update(), stop()).
I think this patch is trying to address the following issue. i.e, we get the refcount a bit late, after we have enabled the sink, and that could make use loose some sessions, if another session drops the refcount before the another one gets it.
i.e,
Session A Session B
enable_sink atomic_inc(refcount) = 1
... atomic_dec(refcount) = 0 enable_sink if (refcount == 0) disable_sink atomic_inc() could we have this in the commit description.
This is only one of the nightmare scenarios that can happen... I'll add the above to the commit log.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-etb10.c | 18 ++++++++++-- .../hwtracing/coresight/coresight-tmc-etf.c | 21 +++++++++++--- .../hwtracing/coresight/coresight-tmc-etr.c | 19 +++++++++++-- drivers/hwtracing/coresight/coresight-tpiu.c | 6 +++- drivers/hwtracing/coresight/coresight.c | 28 +++++++++---------- 5 files changed, 66 insertions(+), 26 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 71c2a3cdb866..dcc550507f70 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -5,6 +5,7 @@
- Description: CoreSight Embedded Trace Buffer driver
*/ +#include <linux/atomic.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/types.h> @@ -160,12 +161,16 @@ static int etb_enable_sysfs(struct coresight_device *csdev) } /* Nothing to do, the tracer is already enabled. */
- if (drvdata->mode == CS_MODE_SYSFS)
- if (drvdata->mode == CS_MODE_SYSFS) {
goto out;atomic_inc(csdev->refcnt);
- } ret = etb_enable_hw(drvdata);
- if (!ret)
- if (!ret) { drvdata->mode = CS_MODE_SYSFS;
atomic_inc(csdev->refcnt);
- } out:
Could we consolidate the above two cases and move it here at out ?;
We can't consolidate at 'out' because it is also the exit way for the first 'if' condition. What can be done is something like this:
if (drvdata->mode == CS_MODE_PERF) { ret = -EBUSY; goto out; }
if (drvdata->mode == CS_MODE_NONE) { ret = etb_enable_hw(drvdata); if (ret) goto out; }
drvdata->mode = CS_MODE_SYSFS; atomic_inc(csdev->refcnt);
out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
This might be what you had in mind... We just happen to be setting the mode once more if we are already in CS_MODE_SYSFS.
if (!ret) atomic_inc(csdev->refcnt);
spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -196,8 +201,10 @@ static int etb_enable_perf(struct coresight_device *csdev, void *data) goto out; ret = etb_enable_hw(drvdata);
- if (!ret)
- if (!ret) { drvdata->mode = CS_MODE_PERF;
atomic_inc(csdev->refcnt);
- } out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -332,6 +339,11 @@ static int etb_disable(struct coresight_device *csdev) spin_lock_irqsave(&drvdata->spinlock, flags);
- if (atomic_dec_return(csdev->refcnt)) {
spin_unlock_irqrestore(&drvdata->spinlock, flags);
return -EBUSY;
- }
- /* Disable the ETB only if it needs to */ if (drvdata->mode != CS_MODE_DISABLED) { etb_disable_hw(drvdata);
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index d4213e7c2c45..b6d0c010eec1 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -4,6 +4,7 @@
- Author: Mathieu Poirier mathieu.poirier@linaro.org
*/
..
/* * If drvdata::buf isn't NULL, memory was allocated for a previous @@ -200,11 +203,13 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev) } ret = tmc_etb_enable_hw(drvdata);
- if (!ret)
- if (!ret) { drvdata->mode = CS_MODE_SYSFS;
- else
atomic_inc(csdev->refcnt);
- } else { /* Free up the buffer if we failed to enable */ used = false;
- } out:
Same as above here.
I'd rather leave this one alone due to the extra complexity around managing the allocated memory.
spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -239,8 +244,10 @@ static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, void *data) if (ret) break; ret = tmc_etb_enable_hw(drvdata);
if (!ret)
if (!ret) { drvdata->mode = CS_MODE_PERF;
atomic_inc(csdev->refcnt);
} while (0); spin_unlock_irqrestore(&drvdata->spinlock, flags);}
@@ -279,6 +286,12 @@ static int tmc_disable_etf_sink(struct coresight_device *csdev) struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); spin_lock_irqsave(&drvdata->spinlock, flags);
- if (atomic_dec_return(csdev->refcnt)) {
spin_unlock_irqrestore(&drvdata->spinlock, flags);
return -EBUSY;
- }
- if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags); return -EBUSY;
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 33501777038a..cbb17b2c53b5 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -4,6 +4,7 @@
- Author: Mathieu Poirier mathieu.poirier@linaro.org
*/ +#include <linux/atomic.h> #include <linux/coresight.h> #include <linux/dma-mapping.h> #include <linux/iommu.h> @@ -1124,8 +1125,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) * sink is already enabled no memory is needed and the HW need not be * touched, even if the buffer size has changed. */
- if (drvdata->mode == CS_MODE_SYSFS)
- if (drvdata->mode == CS_MODE_SYSFS) {
goto out;atomic_inc(csdev->refcnt);
- } /*
- If we don't have a buffer or it doesn't match the requested size,
@@ -1138,8 +1141,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) } ret = tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
- if (!ret)
- if (!ret) { drvdata->mode = CS_MODE_SYSFS;
atomic_inc(csdev->refcnt);
- } out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -1370,8 +1375,10 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf); drvdata->perf_data = etr_perf; rc = tmc_etr_enable_hw(drvdata, etr_perf->etr_buf);
- if (!rc)
- if (!rc) { drvdata->mode = CS_MODE_PERF;
atomic_inc(csdev->refcnt);
- } unlock_out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -1398,6 +1405,12 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev) struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); spin_lock_irqsave(&drvdata->spinlock, flags);
- if (atomic_dec_return(csdev->refcnt)) {
spin_unlock_irqrestore(&drvdata->spinlock, flags);
return -EBUSY;
- }
- if (drvdata->reading) { spin_unlock_irqrestore(&drvdata->spinlock, flags); return -EBUSY;
diff --git a/drivers/hwtracing/coresight/coresight-tpiu.c b/drivers/hwtracing/coresight/coresight-tpiu.c index 0d13da1b9df1..7acbeffcc137 100644 --- a/drivers/hwtracing/coresight/coresight-tpiu.c +++ b/drivers/hwtracing/coresight/coresight-tpiu.c @@ -5,6 +5,7 @@
- Description: CoreSight Trace Port Interface Unit driver
*/ +#include <linux/atomic.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/device.h> @@ -73,7 +74,7 @@ static int tpiu_enable(struct coresight_device *csdev, u32 mode, void *__unused) struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); tpiu_enable_hw(drvdata);
- atomic_inc(csdev->refcnt); dev_dbg(drvdata->dev, "TPIU enabled\n"); return 0; }
@@ -98,6 +99,9 @@ static int tpiu_disable(struct coresight_device *csdev) { struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- if (atomic_dec_return(csdev->refcnt))
return -EBUSY;
- tpiu_disable_hw(drvdata); dev_dbg(drvdata->dev, "TPIU disabled\n");
diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 13eda4693f81..19ba121d7451 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -225,14 +225,13 @@ static int coresight_enable_sink(struct coresight_device *csdev, * We need to make sure the "new" session is compatible with the * existing "mode" of operation. */
- if (sink_ops(csdev)->enable) {
ret = sink_ops(csdev)->enable(csdev, mode, data);
if (ret)
return ret;
csdev->enable = true;
- }
- if (!sink_ops(csdev)->enable)
return -EINVAL;
- atomic_inc(csdev->refcnt);
- ret = sink_ops(csdev)->enable(csdev, mode, data);
- if (ret)
return ret;
- csdev->enable = true; return 0; }
@@ -241,14 +240,13 @@ static void coresight_disable_sink(struct coresight_device *csdev) { int ret;
- if (atomic_dec_return(csdev->refcnt) == 0) {
if (sink_ops(csdev)->disable) {
ret = sink_ops(csdev)->disable(csdev);
if (ret)
return;
csdev->enable = false;
}
- }
- if (!sink_ops(csdev)->disable)
return;
- ret = sink_ops(csdev)->disable(csdev);
- if (ret)
return;
- csdev->enable = false; } static int coresight_enable_link(struct coresight_device *csdev,
Otherwise looks fine to me.
Suzuki
When disabling a sink the reference counter ensures the operation goes through if nobody else is using it. As such if drvdata::mode is already set do CS_MODE_DISABLED, it is an error and should be reported as such.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etb10.c | 9 ++++----- drivers/hwtracing/coresight/coresight-tmc-etf.c | 9 ++++----- drivers/hwtracing/coresight/coresight-tmc-etr.c | 9 ++++----- 3 files changed, 12 insertions(+), 15 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index dcc550507f70..720190cc5904 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -344,11 +344,10 @@ static int etb_disable(struct coresight_device *csdev) return -EBUSY; }
- /* Disable the ETB only if it needs to */ - if (drvdata->mode != CS_MODE_DISABLED) { - etb_disable_hw(drvdata); - drvdata->mode = CS_MODE_DISABLED; - } + /* Complain if we (somehow) got out of sync */ + WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED); + etb_disable_hw(drvdata); + drvdata->mode = CS_MODE_DISABLED; spin_unlock_irqrestore(&drvdata->spinlock, flags);
dev_dbg(drvdata->dev, "ETB disabled\n"); diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index b6d0c010eec1..d3ba6b14791b 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -297,11 +297,10 @@ static int tmc_disable_etf_sink(struct coresight_device *csdev) return -EBUSY; }
- /* Disable the TMC only if it needs to */ - if (drvdata->mode != CS_MODE_DISABLED) { - tmc_etb_disable_hw(drvdata); - drvdata->mode = CS_MODE_DISABLED; - } + /* Complain if we (somehow) got out of sync */ + WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED); + tmc_etb_disable_hw(drvdata); + drvdata->mode = CS_MODE_DISABLED;
spin_unlock_irqrestore(&drvdata->spinlock, flags);
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index cbb17b2c53b5..8d71ea110ff3 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1416,11 +1416,10 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev) return -EBUSY; }
- /* Disable the TMC only if it needs to */ - if (drvdata->mode != CS_MODE_DISABLED) { - tmc_etr_disable_hw(drvdata); - drvdata->mode = CS_MODE_DISABLED; - } + /* Complain if we (somehow) got out of sync */ + WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED); + tmc_etr_disable_hw(drvdata); + drvdata->mode = CS_MODE_DISABLED;
spin_unlock_irqrestore(&drvdata->spinlock, flags);
On 06/03/2019 22:57, Mathieu Poirier wrote:
When disabling a sink the reference counter ensures the operation goes through if nobody else is using it. As such if drvdata::mode is already set do CS_MODE_DISABLED, it is an error and should be reported as such.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
nit: Subject is misleading. This is really a cleanup/fix than a refactoring.
With that:
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
On Wed, Mar 06, 2019 at 03:57:38PM -0700, Mathieu Poirier wrote:
When disabling a sink the reference counter ensures the operation goes through if nobody else is using it. As such if drvdata::mode is already set do CS_MODE_DISABLED, it is an error and should be reported as such.
I found the patch 0006 and 0007 for changing sink::disable() and for sink::update(), both of them don't touch tpiu driver. After look into the tpiu code, tpiu driver doesn't maintain the mode and without field 'drvdata->mode' in driver data structure.
Should we do the same thing (maintain the driver mode) for tpiu driver? I just bring up in case you miss it; if you think this question is not irrelative with cpu wide trace support, please ignore it.
[...]
Thanks, Leo Yan
On Sun, 17 Mar 2019 at 21:06, Leo Yan leo.yan@linaro.org wrote:
On Wed, Mar 06, 2019 at 03:57:38PM -0700, Mathieu Poirier wrote:
When disabling a sink the reference counter ensures the operation goes through if nobody else is using it. As such if drvdata::mode is already set do CS_MODE_DISABLED, it is an error and should be reported as such.
I found the patch 0006 and 0007 for changing sink::disable() and for sink::update(), both of them don't touch tpiu driver. After look into the tpiu code, tpiu driver doesn't maintain the mode and without field 'drvdata->mode' in driver data structure.
Should we do the same thing (maintain the driver mode) for tpiu driver? I just bring up in case you miss it; if you think this question is not irrelative with cpu wide trace support, please ignore it.
The TPIU driver does not support the perf API. Since CPU-wide scenarios are only valid within the context of perf sessions, there is no point implementing anything.
Thanks for your comments, Mathieu
[...]
Thanks, Leo Yan
When operating in CPU-wide trace scenarios and working with an N:1 source/sink HW topology, update() functions need to be made atomic in order to avoid racing with start and stop operations.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etb10.c | 4 +++- drivers/hwtracing/coresight/coresight-tmc-etf.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 720190cc5904..d3dce9399349 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -416,7 +416,7 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev, const u32 *barrier; u32 read_ptr, write_ptr, capacity; u32 status, read_data; - unsigned long offset, to_read; + unsigned long offset, to_read, flags; struct cs_buffers *buf = sink_config; struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
@@ -425,6 +425,7 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev,
capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;
+ spin_lock_irqsave(&drvdata->spinlock, flags); __etb_disable_hw(drvdata); CS_UNLOCK(drvdata->base);
@@ -535,6 +536,7 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev, } __etb_enable_hw(drvdata); CS_LOCK(drvdata->base); + spin_unlock_irqrestore(&drvdata->spinlock, flags);
return to_read; } diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index d3ba6b14791b..9e449ef84883 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -413,7 +413,7 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev, u32 *buf_ptr; u64 read_ptr, write_ptr; u32 status; - unsigned long offset, to_read; + unsigned long offset, to_read, flags; struct cs_buffers *buf = sink_config; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
@@ -424,6 +424,7 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev, if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF)) return 0;
+ spin_lock_irqsave(&drvdata->spinlock, flags); CS_UNLOCK(drvdata->base);
tmc_flush_and_stop(drvdata); @@ -517,6 +518,7 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev, to_read = buf->nr_pages << PAGE_SHIFT; } CS_LOCK(drvdata->base); + spin_unlock_irqrestore(&drvdata->spinlock, flags);
return to_read; }
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
When operating in CPU-wide trace scenarios and working with an N:1 source/sink HW topology, update() functions need to be made atomic in order to avoid racing with start and stop operations.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
nit: Same as the previous patch, the subject is a bit misleading. We are infact fixing/cleaning/preparing the function.
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
There is no point in allocating sink memory for a trace session if there is not way to free it once it is no longer needed. As such make sure the sink API function to allocate and free memory have been implemented before moving ahead with the establishment of a trace session.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etm-perf.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index bbfed70b3402..b8ca3800b56b 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -134,8 +134,7 @@ static void free_event_data(struct work_struct *work) if (event_data->snk_config && !WARN_ON(cpumask_empty(mask))) { cpu = cpumask_first(mask); sink = coresight_get_sink(etm_event_cpu_path(event_data, cpu)); - if (sink_ops(sink)->free_buffer) - sink_ops(sink)->free_buffer(event_data->snk_config); + sink_ops(sink)->free_buffer(event_data->snk_config); }
for_each_cpu(cpu, mask) { @@ -215,7 +214,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, sink = coresight_get_enabled_sink(true); }
- if (!sink || !sink_ops(sink)->alloc_buffer) + if (!sink) goto err;
mask = &event_data->mask; @@ -261,6 +260,9 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (cpu >= nr_cpu_ids) goto err;
+ if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer) + goto err; + /* Allocate the sink buffer for this session */ event_data->snk_config = sink_ops(sink)->alloc_buffer(sink, cpu, pages,
On 06/03/2019 22:57, Mathieu Poirier wrote:
There is no point in allocating sink memory for a trace session if there is not way to free it once it is no longer needed. As such make sure the sink API function to allocate and free memory have been implemented before moving ahead with the establishment of a trace session.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
nit: Subject could be "s/Refactor/Clean up/"
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
Function free_event_data() is already busy and is bound to become worse with the addition of CPU-wide trace scenarios. As such spin off a new function to strickly take care of the sink buffers.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../hwtracing/coresight/coresight-etm-perf.c | 24 ++++++++++++++----- 1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index b8ca3800b56b..806b3dd5872d 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -120,22 +120,34 @@ static int etm_event_init(struct perf_event *event) return ret; }
+static void free_sink_buffer(struct etm_event_data *event_data) +{ + int cpu; + cpumask_t *mask = &event_data->mask; + struct coresight_device *sink; + + if (WARN_ON(cpumask_empty(mask))) + return; + + if (!event_data->snk_config) + return; + + cpu = cpumask_first(mask); + sink = coresight_get_sink(etm_event_cpu_path(event_data, cpu)); + sink_ops(sink)->free_buffer(event_data->snk_config); +} + static void free_event_data(struct work_struct *work) { int cpu; cpumask_t *mask; struct etm_event_data *event_data; - struct coresight_device *sink;
event_data = container_of(work, struct etm_event_data, work); mask = &event_data->mask;
/* Free the sink buffers, if there are any */ - if (event_data->snk_config && !WARN_ON(cpumask_empty(mask))) { - cpu = cpumask_first(mask); - sink = coresight_get_sink(etm_event_cpu_path(event_data, cpu)); - sink_ops(sink)->free_buffer(event_data->snk_config); - } + free_sink_buffer(event_data);
for_each_cpu(cpu, mask) { struct list_head **ppath;
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
Function free_event_data() is already busy and is bound to become worse with the addition of CPU-wide trace scenarios. As such spin off a new function to strickly take care of the sink buffers.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-etm-perf.c | 24 ++++++++++++++----- 1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index b8ca3800b56b..806b3dd5872d 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -120,22 +120,34 @@ static int etm_event_init(struct perf_event *event) return ret; } +static void free_sink_buffer(struct etm_event_data *event_data) +{
- int cpu;
- cpumask_t *mask = &event_data->mask;
- struct coresight_device *sink;
- if (WARN_ON(cpumask_empty(mask)))
return;
- if (!event_data->snk_config)
return;
I think it may be a good idea to keep the check as they were. i.e, we check for the snk_config, followed by empty cpumask. It is perfectly valid for an event_data to have an empty sink_config and and empty cpumask.
With that:
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
On Fri, Mar 15, 2019 at 05:19:30PM +0000, Suzuki K Poulose wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
Function free_event_data() is already busy and is bound to become worse with the addition of CPU-wide trace scenarios. As such spin off a new function to strickly take care of the sink buffers.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-etm-perf.c | 24 ++++++++++++++----- 1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index b8ca3800b56b..806b3dd5872d 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -120,22 +120,34 @@ static int etm_event_init(struct perf_event *event) return ret; } +static void free_sink_buffer(struct etm_event_data *event_data) +{
- int cpu;
- cpumask_t *mask = &event_data->mask;
- struct coresight_device *sink;
- if (WARN_ON(cpumask_empty(mask)))
return;
- if (!event_data->snk_config)
return;
I think it may be a good idea to keep the check as they were. i.e, we check for the snk_config, followed by empty cpumask. It is perfectly valid for an event_data to have an empty sink_config and and empty cpumask.
I must admit you lost me here. Here in order to proceed we need a event_data->snk_config and a mask. In fact not having a ->snk_config should also be a WARN_ON() since etm_setup_aux() ensures one is valid. As for having an empty cpumask, that can't happen either since it is explicitly set in alloc_event_data().
With that:
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
For CPU-wide scenarios trace generated by multiple CPUs can be aggregated by a single sink. This patch paves the way to uses the pid of the process being monitored to allocate and free sink buffer memory along with regimenting access to what source a sink can collect data for.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etb10.c | 3 ++- drivers/hwtracing/coresight/coresight-etm-perf.c | 4 +++- drivers/hwtracing/coresight/coresight-tmc-etf.c | 6 ++++-- drivers/hwtracing/coresight/coresight-tmc-etr.c | 5 +++-- include/linux/coresight.h | 4 +++- 5 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index d3dce9399349..493d4f4143a1 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -355,7 +355,8 @@ static int etb_disable(struct coresight_device *csdev) }
static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu, - void **pages, int nr_pages, bool overwrite) + pid_t pid, void **pages, int nr_pages, + bool overwrite) { int node; struct cs_buffers *buf; diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 806b3dd5872d..76d3e8f1041c 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -13,6 +13,7 @@ #include <linux/init.h> #include <linux/perf_event.h> #include <linux/percpu-defs.h> +#include <linux/sched.h> /* for task_pid_nr() */ #include <linux/slab.h> #include <linux/stringhash.h> #include <linux/types.h> @@ -210,6 +211,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, u32 id; int cpu = event->cpu; cpumask_t *mask; + pid_t pid = task_pid_nr(event->owner); struct coresight_device *sink; struct etm_event_data *event_data = NULL;
@@ -277,7 +279,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
/* Allocate the sink buffer for this session */ event_data->snk_config = - sink_ops(sink)->alloc_buffer(sink, cpu, pages, + sink_ops(sink)->alloc_buffer(sink, cpu, pid, pages, nr_pages, overwrite); if (!event_data->snk_config) goto err; diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index 9e449ef84883..3bb407132a1a 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -9,6 +9,7 @@ #include <linux/coresight.h> #include <linux/perf_event.h> #include <linux/slab.h> +#include <linux/types.h> #include "coresight-priv.h" #include "coresight-tmc.h" #include "coresight-etm-perf.h" @@ -350,8 +351,9 @@ static void tmc_disable_etf_link(struct coresight_device *csdev, dev_dbg(drvdata->dev, "TMC-ETF disabled\n"); }
-static void *tmc_alloc_etf_buffer(struct coresight_device *csdev, int cpu, - void **pages, int nr_pages, bool overwrite) +static void *tmc_alloc_etf_buffer(struct coresight_device *csdev, + int cpu, pid_t pid, void **pages, + int nr_pages, bool overwrite) { int node; struct cs_buffers *buf; diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 8d71ea110ff3..be50591e2dcc 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -9,6 +9,7 @@ #include <linux/dma-mapping.h> #include <linux/iommu.h> #include <linux/slab.h> +#include <linux/types.h> #include <linux/vmalloc.h> #include "coresight-catu.h" #include "coresight-etm-perf.h" @@ -1209,8 +1210,8 @@ tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages, }
-static void *tmc_alloc_etr_buffer(struct coresight_device *csdev, - int cpu, void **pages, int nr_pages, +static void *tmc_alloc_etr_buffer(struct coresight_device *csdev, int cpu, + pid_t pid, void **pages, int nr_pages, bool snapshot) { struct etr_perf_buffer *etr_perf; diff --git a/include/linux/coresight.h b/include/linux/coresight.h index 189cc6ddc92b..21c53cc21f4b 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -9,6 +9,7 @@ #include <linux/device.h> #include <linux/perf_event.h> #include <linux/sched.h> +#include <linux/types.h>
/* Peripheral id registers (0xFD0-0xFEC) */ #define CORESIGHT_PERIPHIDR4 0xfd0 @@ -194,7 +195,8 @@ struct coresight_ops_sink { int (*enable)(struct coresight_device *csdev, u32 mode, void *data); int (*disable)(struct coresight_device *csdev); void *(*alloc_buffer)(struct coresight_device *csdev, int cpu, - void **pages, int nr_pages, bool overwrite); + pid_t pid, void **pages, int nr_pages, + bool overwrite); void (*free_buffer)(void *config); unsigned long (*update_buffer)(struct coresight_device *csdev, struct perf_output_handle *handle,
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
For CPU-wide scenarios trace generated by multiple CPUs can be aggregated by a single sink. This patch paves the way to uses the pid of the process being monitored to allocate and free sink buffer memory along with regimenting access to what source a sink can collect data for.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
I think it may be a good idea to pass the "event" instead of the pid and the cpu (both of which are deduced from the former), separately to the alloc_buffer. What do you think ?
Cheers Suzuki
On Fri, Mar 15, 2019 at 05:24:05PM +0000, Suzuki K Poulose wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
For CPU-wide scenarios trace generated by multiple CPUs can be aggregated by a single sink. This patch paves the way to uses the pid of the process being monitored to allocate and free sink buffer memory along with regimenting access to what source a sink can collect data for.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
I think it may be a good idea to pass the "event" instead of the pid and the cpu (both of which are deduced from the former), separately to the alloc_buffer. What do you think ?
The problem with passing the event is that each sink needs to call task_pid_nr(), which I thought was redundant.
Thanks for taking the time to look at all this. Mathieu
Cheers Suzuki
On 15/03/2019 21:41, Mathieu Poirier wrote:
On Fri, Mar 15, 2019 at 05:24:05PM +0000, Suzuki K Poulose wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
For CPU-wide scenarios trace generated by multiple CPUs can be aggregated by a single sink. This patch paves the way to uses the pid of the process being monitored to allocate and free sink buffer memory along with regimenting access to what source a sink can collect data for.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
I think it may be a good idea to pass the "event" instead of the pid and the cpu (both of which are deduced from the former), separately to the alloc_buffer. What do you think ?
The problem with passing the event is that each sink needs to call task_pid_nr(), which I thought was redundant.
The problem is we are bloating the arglist of alloc_buffer() already with node, and now with pid, all of which are derived from event. Hence the thought.
Cheers Suzuki
On Tue, 19 Mar 2019 at 04:07, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 15/03/2019 21:41, Mathieu Poirier wrote:
On Fri, Mar 15, 2019 at 05:24:05PM +0000, Suzuki K Poulose wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
For CPU-wide scenarios trace generated by multiple CPUs can be aggregated by a single sink. This patch paves the way to uses the pid of the process being monitored to allocate and free sink buffer memory along with regimenting access to what source a sink can collect data for.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
I think it may be a good idea to pass the "event" instead of the pid and the cpu (both of which are deduced from the former), separately to the alloc_buffer. What do you think ?
The problem with passing the event is that each sink needs to call task_pid_nr(), which I thought was redundant.
The problem is we are bloating the arglist of alloc_buffer() already with node, and now with pid, all of which are derived from event. Hence the thought.
I agree... It is a matter of deciding which is the lesser evil. I was ambivalent when I added the cpu and pid in the argument list... I'll just do the change.
Cheers Suzuki
Refactoring function tmc_etr_setup_perf_buf() so that it only deals with the high level etr_perf_buffer, and leaving the allocation of the backend buffer (i.e etr_buf) to another function.
That way the backend buffer allocation function can decide if it wants to reuse an existing buffer (CPU-wide trace scenarios) or simply create a new one.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../hwtracing/coresight/coresight-tmc-etr.c | 46 +++++++++++++------ 1 file changed, 31 insertions(+), 15 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index be50591e2dcc..4ec78f9cdc74 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1159,25 +1159,13 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) return ret; }
-/* - * tmc_etr_setup_perf_buf: Allocate ETR buffer for use by perf. - * The size of the hardware buffer is dependent on the size configured - * via sysfs and the perf ring buffer size. We prefer to allocate the - * largest possible size, scaling down the size by half until it - * reaches a minimum limit (1M), beyond which we give up. - */ -static struct etr_perf_buffer * -tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages, - void **pages, bool snapshot) +static struct etr_buf * +tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, + int nr_pages, void **pages) { struct etr_buf *etr_buf; - struct etr_perf_buffer *etr_perf; unsigned long size;
- etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node); - if (!etr_perf) - return ERR_PTR(-ENOMEM); - /* * Try to match the perf ring buffer size if it is larger * than the size requested via sysfs. @@ -1201,6 +1189,34 @@ tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages, size /= 2; } while (size >= TMC_ETR_PERF_MIN_BUF_SIZE);
+ return ERR_PTR(-ENOMEM); + +done: + return etr_buf; +} + +/* + * tmc_etr_setup_perf_buf: Allocate ETR buffer for use by perf. + * The size of the hardware buffer is dependent on the size configured + * via sysfs and the perf ring buffer size. We prefer to allocate the + * largest possible size, scaling down the size by half until it + * reaches a minimum limit (1M), beyond which we give up. + */ +static struct etr_perf_buffer * +tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, + int nr_pages, void **pages, bool snapshot) +{ + struct etr_buf *etr_buf; + struct etr_perf_buffer *etr_perf; + + etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node); + if (!etr_perf) + return ERR_PTR(-ENOMEM); + + etr_buf = tmc_etr_get_etr_buf(drvdata, node, nr_pages, pages); + if (!IS_ERR(etr_buf)) + goto done; + kfree(etr_perf); return ERR_PTR(-ENOMEM);
Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
Refactoring function tmc_etr_setup_perf_buf() so that it only deals with the high level etr_perf_buffer, and leaving the allocation of the backend buffer (i.e etr_buf) to another function.
That way the backend buffer allocation function can decide if it wants to reuse an existing buffer (CPU-wide trace scenarios) or simply create a new one.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
This patch as such is fine. But see my comments in the next patch.
Cheers Suzuki
In preparation to support CPU-wide trace scenarios, add the notion of process ID to ETR devices so that memory buffers can be shared between events.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../hwtracing/coresight/coresight-tmc-etr.c | 26 +++++++++++++++---- drivers/hwtracing/coresight/coresight-tmc.h | 6 +++++ 2 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 4ec78f9cdc74..3bd03cd890e8 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1160,8 +1160,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) }
static struct etr_buf * -tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, - int nr_pages, void **pages) +alloc_etr_buf(struct tmc_drvdata *drvdata, int node, + int nr_pages, void **pages) { struct etr_buf *etr_buf; unsigned long size; @@ -1195,6 +1195,22 @@ tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, return etr_buf; }
+static struct etr_buf * +tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, + int nr_pages, void **pages) +{ + struct etr_buf *etr_buf; + + etr_buf = alloc_etr_buf(drvdata, node, nr_pages, pages); + if (IS_ERR(etr_buf)) + return etr_buf; + + etr_buf->pid = pid; + refcount_set(&etr_buf->refcount, 1); + + return etr_buf; +} + /* * tmc_etr_setup_perf_buf: Allocate ETR buffer for use by perf. * The size of the hardware buffer is dependent on the size configured @@ -1203,7 +1219,7 @@ tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, * reaches a minimum limit (1M), beyond which we give up. */ static struct etr_perf_buffer * -tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, +tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, int nr_pages, void **pages, bool snapshot) { struct etr_buf *etr_buf; @@ -1213,7 +1229,7 @@ tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, if (!etr_perf) return ERR_PTR(-ENOMEM);
- etr_buf = tmc_etr_get_etr_buf(drvdata, node, nr_pages, pages); + etr_buf = tmc_etr_get_etr_buf(drvdata, node, pid, nr_pages, pages); if (!IS_ERR(etr_buf)) goto done;
@@ -1236,7 +1252,7 @@ static void *tmc_alloc_etr_buffer(struct coresight_device *csdev, int cpu, if (cpu == -1) cpu = smp_processor_id();
- etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu), + etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu), pid, nr_pages, pages, snapshot); if (IS_ERR(etr_perf)) { dev_dbg(drvdata->dev, "Unable to allocate ETR buffer\n"); diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 487c53701e9c..d3657acb6cbf 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -9,6 +9,8 @@
#include <linux/dma-mapping.h> #include <linux/miscdevice.h> +#include <linux/refcount.h> +#include <linux/types.h>
#define TMC_RSZ 0x004 #define TMC_STS 0x00c @@ -133,6 +135,8 @@ struct etr_buf_operations;
/** * struct etr_buf - Details of the buffer used by ETR + * @pid : The pid this etr_buf belongs to. + * @refcount : Number of sources currently using this etr_buf. * @mode : Mode of the ETR buffer, contiguous, Scatter Gather etc. * @full : Trace data overflow * @size : Size of the buffer. @@ -143,6 +147,8 @@ struct etr_buf_operations; * @private : Backend specific information for the buf */ struct etr_buf { + pid_t pid; + refcount_t refcount; enum etr_mode mode; bool full; ssize_t size;
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
In preparation to support CPU-wide trace scenarios, add the notion of process ID to ETR devices so that memory buffers can be shared between events.
The PID only matters to the perf buffers. And etr_buf is really a representation of the ETR Hw buffer and I prefer to keep it that way. Please could we have all the "perf" related book keeping in the etr_perf_buf, which holds the perf session specific data ?
As for enforcing the FLAT mode in the later series, we don't necessarily need to force it for double buffering. Even our SG mode uses double buffering. Even otherwise, we may be able to force a mode, than passing down perf specific data to the generic ETR buffer.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 26 +++++++++++++++---- drivers/hwtracing/coresight/coresight-tmc.h | 6 +++++ 2 files changed, 27 insertions(+), 5 deletions(-)
...
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 487c53701e9c..d3657acb6cbf 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -9,6 +9,8 @@ #include <linux/dma-mapping.h> #include <linux/miscdevice.h> +#include <linux/refcount.h> +#include <linux/types.h> #define TMC_RSZ 0x004 #define TMC_STS 0x00c @@ -133,6 +135,8 @@ struct etr_buf_operations; /**
- struct etr_buf - Details of the buffer used by ETR
- @pid : The pid this etr_buf belongs to.
- @refcount : Number of sources currently using this etr_buf.
- @mode : Mode of the ETR buffer, contiguous, Scatter Gather etc.
- @full : Trace data overflow
- @size : Size of the buffer.
@@ -143,6 +147,8 @@ struct etr_buf_operations;
- @private : Backend specific information for the buf
*/ struct etr_buf {
- pid_t pid;
- refcount_t refcount;
As mentioned above, those are better suited for etr_perf_buf.
Thoughts ?
Suzuki
On Tue, 19 Mar 2019 at 04:00, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
In preparation to support CPU-wide trace scenarios, add the notion of process ID to ETR devices so that memory buffers can be shared between events.
The PID only matters to the perf buffers. And etr_buf is really a representation of the ETR Hw buffer and I prefer to keep it that way. Please could we have all the "perf" related book keeping in the etr_perf_buf, which holds the perf session specific data ?
From a quick look at the code it might streamline the allocation/free
process. I'll let you know if I run into problems but for now I'm pretty sure it is possible.
As for enforcing the FLAT mode in the later series, we don't necessarily need to force it for double buffering. Even our SG mode uses double buffering. Even otherwise, we may be able to force a mode, than passing down perf specific data to the generic ETR buffer.
The patches to force the flat mode was to make sure things are covered when the ETR improves and we are able to use the perf ring buffer properly. I'm happy to drop them (I never really liked them).
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 26 +++++++++++++++---- drivers/hwtracing/coresight/coresight-tmc.h | 6 +++++ 2 files changed, 27 insertions(+), 5 deletions(-)
...
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 487c53701e9c..d3657acb6cbf 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -9,6 +9,8 @@
#include <linux/dma-mapping.h> #include <linux/miscdevice.h> +#include <linux/refcount.h> +#include <linux/types.h>
#define TMC_RSZ 0x004 #define TMC_STS 0x00c @@ -133,6 +135,8 @@ struct etr_buf_operations;
/**
- struct etr_buf - Details of the buffer used by ETR
- @pid : The pid this etr_buf belongs to.
- @refcount : Number of sources currently using this etr_buf.
- @mode : Mode of the ETR buffer, contiguous, Scatter Gather etc.
- @full : Trace data overflow
- @size : Size of the buffer.
@@ -143,6 +147,8 @@ struct etr_buf_operations;
- @private : Backend specific information for the buf
*/ struct etr_buf {
pid_t pid;
refcount_t refcount;
As mentioned above, those are better suited for etr_perf_buf.
Thoughts ?
Suzuki
On Tue, 19 Mar 2019 at 04:00, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
In preparation to support CPU-wide trace scenarios, add the notion of process ID to ETR devices so that memory buffers can be shared between events.
The PID only matters to the perf buffers. And etr_buf is really a representation of the ETR Hw buffer and I prefer to keep it that way. Please could we have all the "perf" related book keeping in the etr_perf_buf, which holds the perf session specific data ?
As for enforcing the FLAT mode in the later series, we don't necessarily need to force it for double buffering. Even our SG mode uses double buffering. Even otherwise, we may be able to force a mode, than passing down perf specific data to the generic ETR buffer.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 26 +++++++++++++++---- drivers/hwtracing/coresight/coresight-tmc.h | 6 +++++ 2 files changed, 27 insertions(+), 5 deletions(-)
...
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 487c53701e9c..d3657acb6cbf 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -9,6 +9,8 @@
#include <linux/dma-mapping.h> #include <linux/miscdevice.h> +#include <linux/refcount.h> +#include <linux/types.h>
#define TMC_RSZ 0x004 #define TMC_STS 0x00c @@ -133,6 +135,8 @@ struct etr_buf_operations;
/**
- struct etr_buf - Details of the buffer used by ETR
- @pid : The pid this etr_buf belongs to.
- @refcount : Number of sources currently using this etr_buf.
- @mode : Mode of the ETR buffer, contiguous, Scatter Gather etc.
- @full : Trace data overflow
- @size : Size of the buffer.
@@ -143,6 +147,8 @@ struct etr_buf_operations;
- @private : Backend specific information for the buf
*/ struct etr_buf {
pid_t pid;
refcount_t refcount;
As mentioned above, those are better suited for etr_perf_buf.
Thoughts ?
Going back on this, I definitely agree with you that pid should be in the etr_perf_buf structure. On the flip side I think refcount should stay in etr_buf as it counts the number of users of this buffer.
Suzuki
On 22/03/2019 19:14, Mathieu Poirier wrote:
On Tue, 19 Mar 2019 at 04:00, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
In preparation to support CPU-wide trace scenarios, add the notion of process ID to ETR devices so that memory buffers can be shared between events.
The PID only matters to the perf buffers. And etr_buf is really a representation of the ETR Hw buffer and I prefer to keep it that way. Please could we have all the "perf" related book keeping in the etr_perf_buf, which holds the perf session specific data ?
As for enforcing the FLAT mode in the later series, we don't necessarily need to force it for double buffering. Even our SG mode uses double buffering. Even otherwise, we may be able to force a mode, than passing down perf specific data to the generic ETR buffer.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 26 +++++++++++++++---- drivers/hwtracing/coresight/coresight-tmc.h | 6 +++++ 2 files changed, 27 insertions(+), 5 deletions(-)
...
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 487c53701e9c..d3657acb6cbf 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -9,6 +9,8 @@
#include <linux/dma-mapping.h> #include <linux/miscdevice.h> +#include <linux/refcount.h> +#include <linux/types.h>
#define TMC_RSZ 0x004 #define TMC_STS 0x00c @@ -133,6 +135,8 @@ struct etr_buf_operations;
/** * struct etr_buf - Details of the buffer used by ETR
- @pid : The pid this etr_buf belongs to.
- @refcount : Number of sources currently using this etr_buf.
- @mode : Mode of the ETR buffer, contiguous, Scatter Gather etc.
- @full : Trace data overflow
- @size : Size of the buffer.
@@ -143,6 +147,8 @@ struct etr_buf_operations; * @private : Backend specific information for the buf */ struct etr_buf {
pid_t pid;
refcount_t refcount;
As mentioned above, those are better suited for etr_perf_buf.
Thoughts ?
Going back on this, I definitely agree with you that pid should be in the etr_perf_buf structure. On the flip side I think refcount should stay in etr_buf as it counts the number of users of this buffer.
I see, you're right, since we could be creating multiple output handles and thus aux-pages per event ? Did I miss something ?
Cheers Suzuki
On Mon, 25 Mar 2019 at 05:49, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 22/03/2019 19:14, Mathieu Poirier wrote:
On Tue, 19 Mar 2019 at 04:00, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
In preparation to support CPU-wide trace scenarios, add the notion of process ID to ETR devices so that memory buffers can be shared between events.
The PID only matters to the perf buffers. And etr_buf is really a representation of the ETR Hw buffer and I prefer to keep it that way. Please could we have all the "perf" related book keeping in the etr_perf_buf, which holds the perf session specific data ?
As for enforcing the FLAT mode in the later series, we don't necessarily need to force it for double buffering. Even our SG mode uses double buffering. Even otherwise, we may be able to force a mode, than passing down perf specific data to the generic ETR buffer.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 26 +++++++++++++++---- drivers/hwtracing/coresight/coresight-tmc.h | 6 +++++ 2 files changed, 27 insertions(+), 5 deletions(-)
...
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 487c53701e9c..d3657acb6cbf 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -9,6 +9,8 @@
#include <linux/dma-mapping.h> #include <linux/miscdevice.h> +#include <linux/refcount.h> +#include <linux/types.h>
#define TMC_RSZ 0x004 #define TMC_STS 0x00c @@ -133,6 +135,8 @@ struct etr_buf_operations;
/** * struct etr_buf - Details of the buffer used by ETR
- @pid : The pid this etr_buf belongs to.
- @refcount : Number of sources currently using this etr_buf.
- @mode : Mode of the ETR buffer, contiguous, Scatter Gather etc.
- @full : Trace data overflow
- @size : Size of the buffer.
@@ -143,6 +147,8 @@ struct etr_buf_operations; * @private : Backend specific information for the buf */ struct etr_buf {
pid_t pid;
refcount_t refcount;
As mentioned above, those are better suited for etr_perf_buf.
Thoughts ?
Going back on this, I definitely agree with you that pid should be in the etr_perf_buf structure. On the flip side I think refcount should stay in etr_buf as it counts the number of users of this buffer.
I see, you're right, since we could be creating multiple output handles and thus aux-pages per event ? Did I miss something ?
In CPU-wide mode a struct etr_perf_buffer is instantiated for each CPU that is part of the trace session. If the HW enacts an 1:1 topology each etr_perf_buffer has its own etr_buf, otherwise the etr_buf is shared between etr_perf_buffer, hence the need to keep the refcount at that level.
I'll spin another revision of this set today. After that I'm charging in your ACPI work.
Cheers Suzuki
On 25/03/2019 15:08, Mathieu Poirier wrote:
On Mon, 25 Mar 2019 at 05:49, Suzuki K Poulose suzuki.poulose@arm.com wrote:
On 22/03/2019 19:14, Mathieu Poirier wrote:
On Tue, 19 Mar 2019 at 04:00, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
In preparation to support CPU-wide trace scenarios, add the notion of process ID to ETR devices so that memory buffers can be shared between events.
The PID only matters to the perf buffers. And etr_buf is really a representation of the ETR Hw buffer and I prefer to keep it that way. Please could we have all the "perf" related book keeping in the etr_perf_buf, which holds the perf session specific data ?
As for enforcing the FLAT mode in the later series, we don't necessarily need to force it for double buffering. Even our SG mode uses double buffering. Even otherwise, we may be able to force a mode, than passing down perf specific data to the generic ETR buffer.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 26 +++++++++++++++---- drivers/hwtracing/coresight/coresight-tmc.h | 6 +++++ 2 files changed, 27 insertions(+), 5 deletions(-)
...
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 487c53701e9c..d3657acb6cbf 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -9,6 +9,8 @@
#include <linux/dma-mapping.h> #include <linux/miscdevice.h>
+#include <linux/refcount.h> +#include <linux/types.h>
#define TMC_RSZ 0x004 #define TMC_STS 0x00c
@@ -133,6 +135,8 @@ struct etr_buf_operations;
/** * struct etr_buf - Details of the buffer used by ETR
- @pid : The pid this etr_buf belongs to.
- @refcount : Number of sources currently using this etr_buf.
- @mode : Mode of the ETR buffer, contiguous, Scatter Gather etc.
- @full : Trace data overflow
- @size : Size of the buffer.
@@ -143,6 +147,8 @@ struct etr_buf_operations; * @private : Backend specific information for the buf */ struct etr_buf {
pid_t pid;
refcount_t refcount;
As mentioned above, those are better suited for etr_perf_buf.
Thoughts ?
Going back on this, I definitely agree with you that pid should be in the etr_perf_buf structure. On the flip side I think refcount should stay in etr_buf as it counts the number of users of this buffer.
I see, you're right, since we could be creating multiple output handles and thus aux-pages per event ? Did I miss something ?
In CPU-wide mode a struct etr_perf_buffer is instantiated for each CPU that is part of the trace session. If the HW enacts an 1:1 topology each etr_perf_buffer has its own etr_buf, otherwise the etr_buf is shared between etr_perf_buffer, hence the need to keep the refcount at that level.
Oh ! Thats nice and makes perfect sense. So we are covered for the 1:1 topology.
I'll spin another revision of this set today. After that I'm charging in your ACPI work.
Thanks. Np, take your time.
Cheers Suzuki
This patch uses the pid of the process being traced to aggregate traces coming from different processors in the same sink, something that is required when collecting traces in CPU-wide mode when the CoreSight HW enacts a N:1 source/sink topology.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../hwtracing/coresight/coresight-tmc-etr.c | 62 ++++++++++++++++++- 1 file changed, 60 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 3bd03cd890e8..e19f63a0b3ac 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -7,6 +7,7 @@ #include <linux/atomic.h> #include <linux/coresight.h> #include <linux/dma-mapping.h> +#include <linux/idr.h> #include <linux/iommu.h> #include <linux/slab.h> #include <linux/types.h> @@ -39,6 +40,9 @@ struct etr_perf_buffer { void **pages; };
+static DEFINE_IDR(session_idr); +static DEFINE_MUTEX(session_idr_lock); + /* Convert the perf index to an offset within the ETR buffer */ #define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT))
@@ -1199,8 +1203,22 @@ static struct etr_buf * tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, int nr_pages, void **pages) { + int ret; struct etr_buf *etr_buf;
+retry: + /* See if a buffer has been allocated for this session */ + mutex_lock(&session_idr_lock); + etr_buf = idr_find(&session_idr, pid); + if (etr_buf) { + refcount_inc(&etr_buf->refcount); + mutex_unlock(&session_idr_lock); + return etr_buf; + } + + /* If we made it here no buffer has been allocated, do so now. */ + mutex_unlock(&session_idr_lock); + etr_buf = alloc_etr_buf(drvdata, node, nr_pages, pages); if (IS_ERR(etr_buf)) return etr_buf; @@ -1208,6 +1226,23 @@ tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, etr_buf->pid = pid; refcount_set(&etr_buf->refcount, 1);
+ /* Now that we have a buffer, add it to the IDR. */ + mutex_lock(&session_idr_lock); + ret = idr_alloc(&session_idr, etr_buf, pid, pid + 1, GFP_KERNEL); + mutex_unlock(&session_idr_lock); + + /* Another event whith this session ID has allocated this buffer. */ + if (ret == -ENOSPC) { + tmc_free_etr_buf(etr_buf); + goto retry; + } + + /* The IDR can't allocate room for a new session, abandon ship. */ + if (ret == -ENOMEM) { + tmc_free_etr_buf(etr_buf); + return ERR_PTR(ret); + } + return etr_buf; }
@@ -1269,9 +1304,32 @@ static void *tmc_alloc_etr_buffer(struct coresight_device *csdev, int cpu, static void tmc_free_etr_buffer(void *config) { struct etr_perf_buffer *etr_perf = config; + struct etr_buf *buf, *etr_buf = etr_perf->etr_buf; + + if (!etr_buf) + goto free_etr_perf_buffer; + + mutex_lock(&session_idr_lock); + /* If we are not the last one to use the buffer, don't touch it. */ + if (!refcount_dec_and_test(&etr_buf->refcount)) { + mutex_unlock(&session_idr_lock); + goto free_etr_perf_buffer; + } + + /* We are the last one, remove from the IDR and free the buffer. */ + buf = idr_remove(&session_idr, etr_buf->pid); + mutex_unlock(&session_idr_lock); + + /* + * Something went very wrong - the buffer associated with this ID + * is not the same in the IDR. Leak to avoid use after free. + */ + if (WARN_ON(buf != etr_buf)) + goto free_etr_perf_buffer; + + tmc_free_etr_buf(etr_perf->etr_buf);
- if (etr_perf->etr_buf) - tmc_free_etr_buf(etr_perf->etr_buf); +free_etr_perf_buffer: kfree(etr_perf); }
This patch adds support for CPU-wide trace scenarios by making sure that only the sources monitoring the same process have access to a common sink. Because the sink is shared between sources, the first source to use the sink switches it on while the last one does the cleanup. Any attempt to modify the HW is overlooked for as long as more than one source is using a sink.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../hwtracing/coresight/coresight-tmc-etr.c | 39 ++++++++++++++++--- drivers/hwtracing/coresight/coresight-tmc.c | 2 + drivers/hwtracing/coresight/coresight-tmc.h | 3 ++ 3 files changed, 39 insertions(+), 5 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index e19f63a0b3ac..dc74c95413dc 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1404,6 +1404,13 @@ tmc_update_etr_buffer(struct coresight_device *csdev, struct etr_buf *etr_buf = etr_perf->etr_buf;
spin_lock_irqsave(&drvdata->spinlock, flags); + + /* Don't do anything if another tracer is using this sink */ + if (atomic_read(csdev->refcnt) != 1) { + spin_unlock_irqrestore(&drvdata->spinlock, flags); + goto out; + } + if (WARN_ON(drvdata->perf_data != etr_perf)) { lost = true; spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -1443,17 +1450,15 @@ tmc_update_etr_buffer(struct coresight_device *csdev, static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) { int rc = 0; + pid_t pid; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); struct perf_output_handle *handle = data; struct etr_perf_buffer *etr_perf = etm_perf_sink_config(handle);
spin_lock_irqsave(&drvdata->spinlock, flags); - /* - * There can be only one writer per sink in perf mode. If the sink - * is already open in SYSFS mode, we can't use it. - */ - if (drvdata->mode != CS_MODE_DISABLED || WARN_ON(drvdata->perf_data)) { + /* Don't use this sink if it is already claimed by sysFS */ + if (drvdata->mode == CS_MODE_SYSFS) { rc = -EBUSY; goto unlock_out; } @@ -1463,10 +1468,32 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) goto unlock_out; }
+ /* Get a handle on the pid of the process to monitor */ + pid = etr_perf->etr_buf->pid; + + /* Do not proceed if this device is associated with another session */ + if (drvdata->pid != -1 && drvdata->pid != pid) { + rc = -EBUSY; + goto unlock_out; + } + etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf); drvdata->perf_data = etr_perf; + + /* + * No HW configuration is needed if the sink is already in + * use for this session. + */ + if (drvdata->pid == pid) { + drvdata->mode = CS_MODE_PERF; + atomic_inc(csdev->refcnt); + goto unlock_out; + } + rc = tmc_etr_enable_hw(drvdata, etr_perf->etr_buf); if (!rc) { + /* Associate with monitored process. */ + drvdata->pid = pid; drvdata->mode = CS_MODE_PERF; atomic_inc(csdev->refcnt); } @@ -1510,6 +1537,8 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev) /* Complain if we (somehow) got out of sync */ WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED); tmc_etr_disable_hw(drvdata); + /* Dissociate from monitored process. */ + drvdata->pid = -1; drvdata->mode = CS_MODE_DISABLED;
spin_unlock_irqrestore(&drvdata->spinlock, flags); diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index 2a02da3d630f..b01e587f376d 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -415,6 +415,8 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id) devid = readl_relaxed(drvdata->base + CORESIGHT_DEVID); drvdata->config_type = BMVAL(devid, 6, 7); drvdata->memwidth = tmc_get_memwidth(devid); + /* This device is not associated with a session */ + drvdata->pid = -1;
if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) { if (np) diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index d3657acb6cbf..1726d77e6d7f 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -166,6 +166,8 @@ struct etr_buf { * @csdev: component vitals needed by the framework. * @miscdev: specifics to handle "/dev/xyz.tmc" entry. * @spinlock: only one at a time pls. + * @pid: Process ID of the process being monitored by the session + * that is using this component. * @buf: Snapshot of the trace data for ETF/ETB. * @etr_buf: details of buffer used in TMC-ETR * @len: size of the available trace for ETF/ETB. @@ -185,6 +187,7 @@ struct tmc_drvdata { struct coresight_device *csdev; struct miscdevice miscdev; spinlock_t spinlock; + pid_t pid; bool reading; union { char *buf; /* TMC ETB */
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
This patch adds support for CPU-wide trace scenarios by making sure that only the sources monitoring the same process have access to a common sink. Because the sink is shared between sources, the first source to use the sink switches it on while the last one does the cleanup. Any attempt to modify the HW is overlooked for as long as more than one source is using a sink.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 39 ++++++++++++++++--- drivers/hwtracing/coresight/coresight-tmc.c | 2 + drivers/hwtracing/coresight/coresight-tmc.h | 3 ++ 3 files changed, 39 insertions(+), 5 deletions(-)
@@ -1463,10 +1468,32 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) goto unlock_out; }
- /* Get a handle on the pid of the process to monitor */
- pid = etr_perf->etr_buf->pid;
- /* Do not proceed if this device is associated with another session */
- if (drvdata->pid != -1 && drvdata->pid != pid) {
rc = -EBUSY;
goto unlock_out;
- }
- etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf); drvdata->perf_data = etr_perf;
- /*
* No HW configuration is needed if the sink is already in
* use for this session.
*/
- if (drvdata->pid == pid) {
drvdata->mode = CS_MODE_PERF;
minor nit: This is not needed ^^. Otherwise, looks good to me.
Suzuki
On Wed, 20 Mar 2019 at 08:16, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
This patch adds support for CPU-wide trace scenarios by making sure that only the sources monitoring the same process have access to a common sink. Because the sink is shared between sources, the first source to use the sink switches it on while the last one does the cleanup. Any attempt to modify the HW is overlooked for as long as more than one source is using a sink.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 39 ++++++++++++++++--- drivers/hwtracing/coresight/coresight-tmc.c | 2 + drivers/hwtracing/coresight/coresight-tmc.h | 3 ++ 3 files changed, 39 insertions(+), 5 deletions(-)
@@ -1463,10 +1468,32 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) goto unlock_out; }
/* Get a handle on the pid of the process to monitor */
pid = etr_perf->etr_buf->pid;
/* Do not proceed if this device is associated with another session */
if (drvdata->pid != -1 && drvdata->pid != pid) {
rc = -EBUSY;
goto unlock_out;
}
etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf); drvdata->perf_data = etr_perf;
/*
* No HW configuration is needed if the sink is already in
* use for this session.
*/
if (drvdata->pid == pid) {
drvdata->mode = CS_MODE_PERF;
minor nit: This is not needed ^^. Otherwise, looks good to me.
An astute observation - I will also fix it for ETF and ETB.
Mathieu
Suzuki
This patch adds support for CPU-wide trace scenarios by making sure that only the sources monitoring the same process have access to a common sink. Because the sink is shared between sources, the first source to use the sink switches it on while the last one does the cleanup. Any attempt to modify the HW is overlooked for as long as more than one source is using a sink.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../hwtracing/coresight/coresight-tmc-etf.c | 41 ++++++++++++++++--- 1 file changed, 36 insertions(+), 5 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index 3bb407132a1a..33055e2c3017 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -224,6 +224,7 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev) static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, void *data) { int ret = 0; + pid_t pid; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); struct perf_output_handle *handle = data; @@ -234,18 +235,40 @@ static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, void *data) if (drvdata->reading) break; /* - * In Perf mode there can be only one writer per sink. There - * is also no need to continue if the ETB/ETF is already - * operated from sysFS. + * No need to continue if the ETB/ETF is already operated + * from sysFS. */ - if (drvdata->mode != CS_MODE_DISABLED) + if (drvdata->mode == CS_MODE_SYSFS) { + ret = -EBUSY; break; + } + + /* Get a handle on the pid of the process to monitor */ + pid = task_pid_nr(handle->event->owner); + + if (drvdata->pid != -1 && drvdata->pid != pid) { + ret = -EBUSY; + break; + }
ret = tmc_set_etf_buffer(csdev, handle); if (ret) break; + + /* + * No HW configuration is needed if the sink is already in + * use for this session. + */ + if (drvdata->pid == pid) { + drvdata->mode = CS_MODE_PERF; + atomic_inc(csdev->refcnt); + break; + } + ret = tmc_etb_enable_hw(drvdata); if (!ret) { + /* Associate with monitored process. */ + drvdata->pid = pid; drvdata->mode = CS_MODE_PERF; atomic_inc(csdev->refcnt); } @@ -301,6 +324,8 @@ static int tmc_disable_etf_sink(struct coresight_device *csdev) /* Complain if we (somehow) got out of sync */ WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED); tmc_etb_disable_hw(drvdata); + /* Dissociate from monitored process. */ + drvdata->pid = -1; drvdata->mode = CS_MODE_DISABLED;
spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -415,7 +440,7 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev, u32 *buf_ptr; u64 read_ptr, write_ptr; u32 status; - unsigned long offset, to_read, flags; + unsigned long offset, to_read = 0, flags; struct cs_buffers *buf = sink_config; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
@@ -427,6 +452,11 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev, return 0;
spin_lock_irqsave(&drvdata->spinlock, flags); + + /* Don't do anything if another tracer is using this sink */ + if (atomic_read(csdev->refcnt) != 1) + goto out; + CS_UNLOCK(drvdata->base);
tmc_flush_and_stop(drvdata); @@ -520,6 +550,7 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev, to_read = buf->nr_pages << PAGE_SHIFT; } CS_LOCK(drvdata->base); +out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
return to_read;
This patch adds support for CPU-wide trace scenarios by making sure that only the sources monitoring the same process have access to a common sink. Because the sink is shared between sources, the first source to use the sink switches it on while the last one does the cleanup. Any attempt to modify the HW is overlooked for as long as more than one source is using a sink.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etb10.c | 44 +++++++++++++++++-- 1 file changed, 40 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 493d4f4143a1..cb16191ba3fd 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -72,6 +72,8 @@ * @miscdev: specifics to handle "/dev/xyz.etb" entry. * @spinlock: only one at a time pls. * @reading: synchronise user space access to etb buffer. + * @pid: Process ID of the process being monitored by the session + * that is using this component. * @buf: area of memory where ETB buffer content gets sent. * @mode: this ETB is being used. * @buffer_depth: size of @buf. @@ -85,6 +87,7 @@ struct etb_drvdata { struct miscdevice miscdev; spinlock_t spinlock; local_t reading; + pid_t pid; u8 *buf; u32 mode; u32 buffer_depth; @@ -180,28 +183,50 @@ static int etb_enable_sysfs(struct coresight_device *csdev) static int etb_enable_perf(struct coresight_device *csdev, void *data) { int ret = 0; + pid_t pid; unsigned long flags; struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + struct perf_output_handle *handle = data;
spin_lock_irqsave(&drvdata->spinlock, flags);
- /* No need to continue if the component is already in use. */ - if (drvdata->mode != CS_MODE_DISABLED) { + /* No need to continue if the component is already in used by sysFS. */ + if (drvdata->mode == CS_MODE_SYSFS) { ret = -EBUSY; goto out; }
+ /* Get a handle on the pid of the process to monitor */ + pid = task_pid_nr(handle->event->owner); + + if (drvdata->pid != -1 && drvdata->pid != pid) { + ret = -EBUSY; + goto out; + } + + /* + * No HW configuration is needed if the sink is already in + * use for this session. + */ + if (drvdata->pid == pid) { + drvdata->mode = CS_MODE_PERF; + atomic_inc(csdev->refcnt); + goto out; + } + /* * We don't have an internal state to clean up if we fail to setup * the perf buffer. So we can perform the step before we turn the * ETB on and leave without cleaning up. */ - ret = etb_set_buffer(csdev, (struct perf_output_handle *)data); + ret = etb_set_buffer(csdev, handle); if (ret) goto out;
ret = etb_enable_hw(drvdata); if (!ret) { + /* Associate with monitored process. */ + drvdata->pid = pid; drvdata->mode = CS_MODE_PERF; atomic_inc(csdev->refcnt); } @@ -347,6 +372,8 @@ static int etb_disable(struct coresight_device *csdev) /* Complain if we (somehow) got out of sync */ WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED); etb_disable_hw(drvdata); + /* Dissociate from monitored process. */ + drvdata->pid = -1; drvdata->mode = CS_MODE_DISABLED; spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -417,7 +444,7 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev, const u32 *barrier; u32 read_ptr, write_ptr, capacity; u32 status, read_data; - unsigned long offset, to_read, flags; + unsigned long offset, to_read = 0, flags; struct cs_buffers *buf = sink_config; struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
@@ -427,6 +454,11 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev, capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;
spin_lock_irqsave(&drvdata->spinlock, flags); + + /* Don't do anything if another tracer is using this sink */ + if (atomic_read(csdev->refcnt) != 1) + goto out; + __etb_disable_hw(drvdata); CS_UNLOCK(drvdata->base);
@@ -537,6 +569,7 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev, } __etb_enable_hw(drvdata); CS_LOCK(drvdata->base); +out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
return to_read; @@ -745,6 +778,9 @@ static int etb_probe(struct amba_device *adev, const struct amba_id *id) if (!drvdata->buf) return -ENOMEM;
+ /* This device is not associated with a session */ + drvdata->pid = -1; + desc.type = CORESIGHT_DEV_TYPE_SINK; desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_BUFFER; desc.ops = &etb_cs_ops;
The ETR sink needs to know if it is shared between sources in order to adequately allocate memory for CPU-wide trace sessions. If operating in an N:1 source/sink topology and the sink is shared, double buffering mode needs to be used in order to manage concurrency.
Since function etm_setup_aux() already has a handle on the path for the session, simply make it available to the ETR so that it can decide how memory is allocated.
This patch is introducing no change in behavior.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-etb10.c | 2 +- drivers/hwtracing/coresight/coresight-etm-perf.c | 9 ++++++--- drivers/hwtracing/coresight/coresight-tmc-etf.c | 2 +- drivers/hwtracing/coresight/coresight-tmc-etr.c | 11 +++++++++-- include/linux/coresight.h | 2 +- 5 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index cb16191ba3fd..660a146f2ea2 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -381,7 +381,7 @@ static int etb_disable(struct coresight_device *csdev) return 0; }
-static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu, +static void *etb_alloc_buffer(struct list_head *path, int cpu, pid_t pid, void **pages, int nr_pages, bool overwrite) { diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 76d3e8f1041c..8013b09597b9 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -214,6 +214,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, pid_t pid = task_pid_nr(event->owner); struct coresight_device *sink; struct etm_event_data *event_data = NULL; + struct list_head *path = NULL;
event_data = alloc_event_data(cpu); if (!event_data) @@ -241,7 +242,6 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, * CPUs, we can handle it and fail the session. */ for_each_cpu(cpu, mask) { - struct list_head *path; struct coresight_device *csdev;
csdev = per_cpu(csdev_src, cpu); @@ -277,9 +277,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer) goto err;
- /* Allocate the sink buffer for this session */ + /* + * Allocate the sink buffer for this session. In per-thread mode + * any path from source to sink will do. + */ event_data->snk_config = - sink_ops(sink)->alloc_buffer(sink, cpu, pid, pages, + sink_ops(sink)->alloc_buffer(path, cpu, pid, pages, nr_pages, overwrite); if (!event_data->snk_config) goto err; diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index 33055e2c3017..c88073044da2 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -376,7 +376,7 @@ static void tmc_disable_etf_link(struct coresight_device *csdev, dev_dbg(drvdata->dev, "TMC-ETF disabled\n"); }
-static void *tmc_alloc_etf_buffer(struct coresight_device *csdev, +static void *tmc_alloc_etf_buffer(struct list_head *path, int cpu, pid_t pid, void **pages, int nr_pages, bool overwrite) { diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index dc74c95413dc..d8c71d22bdef 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1277,12 +1277,19 @@ tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, }
-static void *tmc_alloc_etr_buffer(struct coresight_device *csdev, int cpu, +static void *tmc_alloc_etr_buffer(struct list_head *path, int cpu, pid_t pid, void **pages, int nr_pages, bool snapshot) { struct etr_perf_buffer *etr_perf; - struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + struct coresight_device *csdev; + struct tmc_drvdata *drvdata; + + csdev = coresight_get_sink(path); + if (!csdev) + return NULL; + + drvdata = dev_get_drvdata(csdev->dev.parent);
if (cpu == -1) cpu = smp_processor_id(); diff --git a/include/linux/coresight.h b/include/linux/coresight.h index 21c53cc21f4b..26e6cd042339 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -194,7 +194,7 @@ struct coresight_device { struct coresight_ops_sink { int (*enable)(struct coresight_device *csdev, u32 mode, void *data); int (*disable)(struct coresight_device *csdev); - void *(*alloc_buffer)(struct coresight_device *csdev, int cpu, + void *(*alloc_buffer)(struct list_head *path, int cpu, pid_t pid, void **pages, int nr_pages, bool overwrite); void (*free_buffer)(void *config);
When working with CPU-wide scenarios and the HW enacts an N:1 source/sink topology the sink is shared between source and needs to use double buffering. As such introduce a new function to easily determine if a sink is shared by multiple source.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- drivers/hwtracing/coresight/coresight-priv.h | 1 + drivers/hwtracing/coresight/coresight.c | 25 ++++++++++++++++++++ 2 files changed, 26 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h index e0684d06e9ee..5d7213561e94 100644 --- a/drivers/hwtracing/coresight/coresight-priv.h +++ b/drivers/hwtracing/coresight/coresight-priv.h @@ -146,6 +146,7 @@ static inline void coresight_write_reg_pair(void __iomem *addr, u64 val,
void coresight_disable_path(struct list_head *path); int coresight_enable_path(struct list_head *path, u32 mode, void *sink_data); +int coresight_sink_is_shared(struct list_head *path); struct coresight_device *coresight_get_sink(struct list_head *path); struct coresight_device *coresight_get_enabled_sink(bool reset); struct coresight_device *coresight_get_sink_by_id(u32 id); diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 19ba121d7451..d74b2f621885 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -483,6 +483,31 @@ int coresight_enable_path(struct list_head *path, u32 mode, void *sink_data) goto out; }
+int coresight_sink_is_shared(struct list_head *path) +{ + enum coresight_dev_type type; + enum coresight_dev_subtype_link subtype; + struct coresight_node *nd; + struct coresight_device *csdev; + + if (!path) + return -EINVAL; + + /* Go through each of the entries in the path */ + list_for_each_entry(nd, path, link) { + csdev = nd->csdev; + type = csdev->type; + subtype = csdev->subtype.link_subtype; + + /* If we find a funnel the sink is shared */ + if (type == CORESIGHT_DEV_TYPE_LINK && + subtype == CORESIGHT_DEV_SUBTYPE_LINK_MERG) + return 1; + } + + return 0; +} + struct coresight_device *coresight_get_sink(struct list_head *path) { struct coresight_device *csdev;
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
When working with CPU-wide scenarios and the HW enacts an N:1 source/sink topology the sink is shared between source and needs to use double buffering. As such introduce a new function to easily determine if a sink is shared by multiple source.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-priv.h | 1 + drivers/hwtracing/coresight/coresight.c | 25 ++++++++++++++++++++ 2 files changed, 26 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h index e0684d06e9ee..5d7213561e94 100644 --- a/drivers/hwtracing/coresight/coresight-priv.h +++ b/drivers/hwtracing/coresight/coresight-priv.h @@ -146,6 +146,7 @@ static inline void coresight_write_reg_pair(void __iomem *addr, u64 val, void coresight_disable_path(struct list_head *path); int coresight_enable_path(struct list_head *path, u32 mode, void *sink_data); +int coresight_sink_is_shared(struct list_head *path); struct coresight_device *coresight_get_sink(struct list_head *path); struct coresight_device *coresight_get_enabled_sink(bool reset); struct coresight_device *coresight_get_sink_by_id(u32 id); diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 19ba121d7451..d74b2f621885 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -483,6 +483,31 @@ int coresight_enable_path(struct list_head *path, u32 mode, void *sink_data) goto out; } +int coresight_sink_is_shared(struct list_head *path) +{
- enum coresight_dev_type type;
- enum coresight_dev_subtype_link subtype;
- struct coresight_node *nd;
- struct coresight_device *csdev;
- if (!path)
return -EINVAL;
- /* Go through each of the entries in the path */
- list_for_each_entry(nd, path, link) {
csdev = nd->csdev;
type = csdev->type;
subtype = csdev->subtype.link_subtype;
/* If we find a funnel the sink is shared */
if (type == CORESIGHT_DEV_TYPE_LINK &&
subtype == CORESIGHT_DEV_SUBTYPE_LINK_MERG)
return 1;
One caveat with this logic is that, we have systems where a funnel appears in the path, whose input ports are not connected, implying they may not be shared after all (e.g, imx7s in arm32). It may be a good idea to add (csdev->nr_inport > 1) check to the list, if we figure out that it is a funnel.
Otherwise, looks good to me.
Suzuki
On Mon, 25 Mar 2019 at 05:44, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
When working with CPU-wide scenarios and the HW enacts an N:1 source/sink topology the sink is shared between source and needs to use double buffering. As such introduce a new function to easily determine if a sink is shared by multiple source.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
drivers/hwtracing/coresight/coresight-priv.h | 1 + drivers/hwtracing/coresight/coresight.c | 25 ++++++++++++++++++++ 2 files changed, 26 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h index e0684d06e9ee..5d7213561e94 100644 --- a/drivers/hwtracing/coresight/coresight-priv.h +++ b/drivers/hwtracing/coresight/coresight-priv.h @@ -146,6 +146,7 @@ static inline void coresight_write_reg_pair(void __iomem *addr, u64 val,
void coresight_disable_path(struct list_head *path); int coresight_enable_path(struct list_head *path, u32 mode, void *sink_data); +int coresight_sink_is_shared(struct list_head *path); struct coresight_device *coresight_get_sink(struct list_head *path); struct coresight_device *coresight_get_enabled_sink(bool reset); struct coresight_device *coresight_get_sink_by_id(u32 id); diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 19ba121d7451..d74b2f621885 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -483,6 +483,31 @@ int coresight_enable_path(struct list_head *path, u32 mode, void *sink_data) goto out; }
+int coresight_sink_is_shared(struct list_head *path) +{
enum coresight_dev_type type;
enum coresight_dev_subtype_link subtype;
struct coresight_node *nd;
struct coresight_device *csdev;
if (!path)
return -EINVAL;
/* Go through each of the entries in the path */
list_for_each_entry(nd, path, link) {
csdev = nd->csdev;
type = csdev->type;
subtype = csdev->subtype.link_subtype;
/* If we find a funnel the sink is shared */
if (type == CORESIGHT_DEV_TYPE_LINK &&
subtype == CORESIGHT_DEV_SUBTYPE_LINK_MERG)
return 1;
One caveat with this logic is that, we have systems where a funnel appears in the path, whose input ports are not connected, implying they may not be shared after all (e.g, imx7s in arm32). It may be a good idea to add (csdev->nr_inport > 1) check to the list, if we figure out that it is a funnel.
You are 100% right.
Otherwise, looks good to me.
Suzuki
This patch makes the ETR component aware of the topology the HW is enacting so that it can make choices about the memory mode to use when working in CPU-wide scenarios.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../hwtracing/coresight/coresight-tmc-etr.c | 19 +++++++++++++++++++ drivers/hwtracing/coresight/coresight-tmc.c | 2 ++ drivers/hwtracing/coresight/coresight-tmc.h | 2 ++ 3 files changed, 23 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index d8c71d22bdef..eace816ff042 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1276,6 +1276,22 @@ tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, return etr_perf; }
+static int tmc_etr_is_shared(struct tmc_drvdata *drvdata, + struct list_head *path) +{ + int shared; + + /* The HW topology is static, no need to check this twice */ + if (drvdata->shared >= 0) + return 0; + + shared = coresight_sink_is_shared(path); + if (shared == -EINVAL) + return shared; + + drvdata->shared = shared; + return 0; +}
static void *tmc_alloc_etr_buffer(struct list_head *path, int cpu, pid_t pid, void **pages, int nr_pages, @@ -1291,6 +1307,9 @@ static void *tmc_alloc_etr_buffer(struct list_head *path, int cpu,
drvdata = dev_get_drvdata(csdev->dev.parent);
+ if (tmc_etr_is_shared(drvdata, path)) + return NULL; + if (cpu == -1) cpu = smp_processor_id();
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index b01e587f376d..13603249506c 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -417,6 +417,8 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id) drvdata->memwidth = tmc_get_memwidth(devid); /* This device is not associated with a session */ drvdata->pid = -1; + /* No way to know what kind of topology we have */ + drvdata->shared = -1;
if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) { if (np) diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 1726d77e6d7f..5373e5c7556e 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -178,6 +178,7 @@ struct etr_buf { * @trigger_cntr: amount of words to store after a trigger. * @etr_caps: Bitmask of capabilities of the TMC ETR, inferred from the * device configuration register (DEVID) + * @shared: This TMC is in an N:1 source/sink topology. * @perf_data: PERF buffer for ETR. * @sysfs_data: SYSFS buffer for ETR. */ @@ -200,6 +201,7 @@ struct tmc_drvdata { enum tmc_mem_intf_width memwidth; u32 trigger_cntr; u32 etr_caps; + int shared; struct etr_buf *sysfs_buf; void *perf_data; };
Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
This patch makes the ETR component aware of the topology the HW is enacting so that it can make choices about the memory mode to use when working in CPU-wide scenarios.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 19 +++++++++++++++++++ drivers/hwtracing/coresight/coresight-tmc.c | 2 ++ drivers/hwtracing/coresight/coresight-tmc.h | 2 ++ 3 files changed, 23 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index d8c71d22bdef..eace816ff042 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1276,6 +1276,22 @@ tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, return etr_perf; } +static int tmc_etr_is_shared(struct tmc_drvdata *drvdata,
struct list_head *path)
+{
- int shared;
- /* The HW topology is static, no need to check this twice */
- if (drvdata->shared >= 0)
return 0;
- shared = coresight_sink_is_shared(path);
- if (shared == -EINVAL)
return shared;
- drvdata->shared = shared;
- return 0;
+} static void *tmc_alloc_etr_buffer(struct list_head *path, int cpu, pid_t pid, void **pages, int nr_pages, @@ -1291,6 +1307,9 @@ static void *tmc_alloc_etr_buffer(struct list_head *path, int cpu, drvdata = dev_get_drvdata(csdev->dev.parent);
- if (tmc_etr_is_shared(drvdata, path))
return NULL;
So, we don't support tracing if the ETR is shared ? I am failing to understand the rationale of this whole "shared" check. Please could you elaborate ? And may be even add a comment here ?
Cheers Suzuki
On Mon, 25 Mar 2019 at 09:01, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Mathieu,
On 06/03/2019 22:57, Mathieu Poirier wrote:
This patch makes the ETR component aware of the topology the HW is enacting so that it can make choices about the memory mode to use when working in CPU-wide scenarios.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org
.../hwtracing/coresight/coresight-tmc-etr.c | 19 +++++++++++++++++++ drivers/hwtracing/coresight/coresight-tmc.c | 2 ++ drivers/hwtracing/coresight/coresight-tmc.h | 2 ++ 3 files changed, 23 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index d8c71d22bdef..eace816ff042 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1276,6 +1276,22 @@ tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, return etr_perf; }
+static int tmc_etr_is_shared(struct tmc_drvdata *drvdata,
struct list_head *path)
+{
int shared;
/* The HW topology is static, no need to check this twice */
if (drvdata->shared >= 0)
return 0;
shared = coresight_sink_is_shared(path);
if (shared == -EINVAL)
return shared;
drvdata->shared = shared;
return 0;
+}
static void *tmc_alloc_etr_buffer(struct list_head *path, int cpu, pid_t pid, void **pages, int nr_pages, @@ -1291,6 +1307,9 @@ static void *tmc_alloc_etr_buffer(struct list_head *path, int cpu,
drvdata = dev_get_drvdata(csdev->dev.parent);
if (tmc_etr_is_shared(drvdata, path))
return NULL;
So, we don't support tracing if the ETR is shared ? I am failing to understand the rationale of this whole "shared" check. Please could you elaborate ? And may be even add a comment here ?
That is very poor function naming on my side... I had to go back and look at my own code do understand what it does. Here tmc_etr_is_shared() reports the status of the process that determines if a sink is shared rather than the shared status of the sink. I would definitely fix that if we were to move ahead with this code but I thought we both agreed forcing double buffering is something that would be left for another day. As such 17 to 20 can be ignored.
Cheers Suzuki
In conjunction with topology information, the ETR device needs to know what kind of session it is part of in order to properly allocate memory. More specifically if it is part of an N:1 source/sink topology and used for a CPU-wide scenairo, a flat memory model is mandatory to avoid concurrency issues.
Since the only way to differentiate a per-thread from a CPU-wide session is by looking up the "event->cpu", push the information down to where memory is allocated so that proper decision with regards to allocation can be made.
Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org --- .../hwtracing/coresight/coresight-etm-perf.c | 3 +- .../hwtracing/coresight/coresight-tmc-etr.c | 48 ++++++++++++------- 2 files changed, 33 insertions(+), 18 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 8013b09597b9..896cfa93d5fb 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -282,7 +282,8 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, * any path from source to sink will do. */ event_data->snk_config = - sink_ops(sink)->alloc_buffer(path, cpu, pid, pages, + sink_ops(sink)->alloc_buffer(path, event->cpu, + pid, pages, nr_pages, overwrite); if (!event_data->snk_config) goto err; diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index eace816ff042..b99dc06f374d 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -813,16 +813,22 @@ static inline int tmc_etr_mode_alloc_buf(int mode, */ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata, ssize_t size, int flags, - int node, void **pages) + int cpu, void **pages) { - int rc = -ENOMEM; + int rc = -ENOMEM, node; bool has_etr_sg, has_iommu; bool has_sg, has_catu; + bool cpu_wide, is_shared; struct etr_buf *etr_buf;
+ node = (cpu != -1) ? cpu_to_node(cpu) : + cpu_to_node(smp_processor_id()); + has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG); has_iommu = iommu_get_domain_for_dev(drvdata->dev); has_catu = !!tmc_etr_get_catu_device(drvdata); + cpu_wide = (cpu != -1); + is_shared = drvdata->shared;
has_sg = has_catu || has_etr_sg;
@@ -833,10 +839,14 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata, etr_buf->size = size;
/* - * If we have to use an existing list of pages, we cannot reliably - * use a contiguous DMA memory (even if we have an IOMMU). Otherwise, - * we use the contiguous DMA memory if at least one of the following - * conditions is true: + * If the ETR is shared and we have a CPU-wide trace scenario, we + * need to use double buffering (i.e flat mode) in order to control + * concurrency. + * + * Furthermore, if we have to use an existing list of pages, we cannot + * reliably use a contiguous DMA memory (even if we have an IOMMU). + * Otherwise, we use the contiguous DMA memory if at least one of the + * following conditions is true: * a) The ETR cannot use Scatter-Gather. * b) we have a backing IOMMU * c) The requested memory size is smaller (< 1M). @@ -844,6 +854,9 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata, * Fallback to available mechanisms. * */ + if (cpu_wide && is_shared) + rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata, + etr_buf, node, pages); if (!pages && (!has_sg || has_iommu || size < SZ_1M)) rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata, @@ -1164,7 +1177,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) }
static struct etr_buf * -alloc_etr_buf(struct tmc_drvdata *drvdata, int node, +alloc_etr_buf(struct tmc_drvdata *drvdata, int cpu, int nr_pages, void **pages) { struct etr_buf *etr_buf; @@ -1176,7 +1189,7 @@ alloc_etr_buf(struct tmc_drvdata *drvdata, int node, */ if ((nr_pages << PAGE_SHIFT) > drvdata->size) { etr_buf = tmc_alloc_etr_buf(drvdata, (nr_pages << PAGE_SHIFT), - 0, node, NULL); + 0, cpu, NULL); if (!IS_ERR(etr_buf)) goto done; } @@ -1187,7 +1200,7 @@ alloc_etr_buf(struct tmc_drvdata *drvdata, int node, */ size = drvdata->size; do { - etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL); + etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, cpu, NULL); if (!IS_ERR(etr_buf)) goto done; size /= 2; @@ -1200,7 +1213,7 @@ alloc_etr_buf(struct tmc_drvdata *drvdata, int node, }
static struct etr_buf * -tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, +tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int cpu, pid_t pid, int nr_pages, void **pages) { int ret; @@ -1219,7 +1232,7 @@ tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, /* If we made it here no buffer has been allocated, do so now. */ mutex_unlock(&session_idr_lock);
- etr_buf = alloc_etr_buf(drvdata, node, nr_pages, pages); + etr_buf = alloc_etr_buf(drvdata, cpu, nr_pages, pages); if (IS_ERR(etr_buf)) return etr_buf;
@@ -1254,17 +1267,21 @@ tmc_etr_get_etr_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, * reaches a minimum limit (1M), beyond which we give up. */ static struct etr_perf_buffer * -tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, pid_t pid, +tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int cpu, pid_t pid, int nr_pages, void **pages, bool snapshot) { + int node; struct etr_buf *etr_buf; struct etr_perf_buffer *etr_perf;
+ node = (cpu != -1) ? cpu_to_node(cpu) : + cpu_to_node(smp_processor_id()); + etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node); if (!etr_perf) return ERR_PTR(-ENOMEM);
- etr_buf = tmc_etr_get_etr_buf(drvdata, node, pid, nr_pages, pages); + etr_buf = tmc_etr_get_etr_buf(drvdata, cpu, pid, nr_pages, pages); if (!IS_ERR(etr_buf)) goto done;
@@ -1310,10 +1327,7 @@ static void *tmc_alloc_etr_buffer(struct list_head *path, int cpu, if (tmc_etr_is_shared(drvdata, path)) return NULL;
- if (cpu == -1) - cpu = smp_processor_id(); - - etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu), pid, + etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu, pid, nr_pages, pages, snapshot); if (IS_ERR(etr_perf)) { dev_dbg(drvdata->dev, "Unable to allocate ETR buffer\n");
On Wed, 6 Mar 2019 at 15:57, Mathieu Poirier mathieu.poirier@linaro.org wrote:
This patchset adds support for CPU-wide trace scenarios and as such, it is now possible to issue the following commands:
# perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND # perf record -e cs_etm/@20070000.etr/ -a $COMMAND
The above will trace all instructions executed by a given processor for as long as $COMMAND hasn't returned. The solution is designed to work for both 1:1 and N:1 source/sink topologies, though the former hasn't been tested for lack of access to HW.
Most of the changes revolve around allowing more than one event to use a sink when operated from perf. More specifically the first event to use a sink switches it on while the last one is tasked to aggregate traces and switching off the device.
This is the kernel part of the solution, with the user space portion to be released in a separate set. All the patches have been rebased on yesterday's linux next and hosted here[1]. Everything has been tested on Juno. I have not CC'ed the kernel mailing list because of the ongoing merge window.
I forgot to mention - in order for the feature to work kernel config CONFIG_PID_IN_CONTEXTIDR needs to be set. The config is accessible under "Kernel hacking" > "Write the current PID to the CONTEXTIDR register". Without it the tracers will not generate PE_CONTEXT packets, something that links the executing process with the perf thread mechanic and access to the memory map.
For the next revision I will fold the automatic selection of the define in the Kconfig.
Mathieu
Review and comments would be most appreciated.
Regards, Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/log/?h=next-2019...
Mathieu Poirier (20): coresight: pmu: Adding ITRACE property to cs_etm PMU coresight: etm4x: Add kernel configuration for CONTEXTID coresight: etm4x: Configure tracers to emit timestamps coresight: Adding return code to sink::disable() operation coresight: Move reference counting inside sink drivers coresight: Refactor sink::disable() functions coresight: Refactor sink::update() functions coresight: perf: Refactor function etm_setup_aux() coresight: perf: Refactor function free_event_data() coresight: Introduce the notion of process ID to the framework coresight: tmc-etr: Refactor function tmc_etr_setup_perf_buf() coresight: tmc-etr: Introduce the notion of process ID to ETR devices coresight: tmc-etr: Allow events to use the same ETR buffer coresight: tmc-etr: Add support for CPU-wide trace scenarios coresight: tmc-etf: Add support for CPU-wide trace scenarios coresight: etb10: Add support for CPU-wide trace scenarios coresight: Refactor sink::alloc_buffer() functions coresight: Add function coresight_sink_is_shared() coresight: tmc-etr: Make ETR aware of topology coresight: Use event->cpu to determine session type
drivers/hwtracing/coresight/coresight-etb10.c | 79 +++++- .../hwtracing/coresight/coresight-etm-perf.c | 47 +++- drivers/hwtracing/coresight/coresight-etm4x.c | 114 +++++++- drivers/hwtracing/coresight/coresight-priv.h | 1 + .../hwtracing/coresight/coresight-tmc-etf.c | 84 ++++-- .../hwtracing/coresight/coresight-tmc-etr.c | 265 +++++++++++++++--- drivers/hwtracing/coresight/coresight-tmc.c | 4 + drivers/hwtracing/coresight/coresight-tmc.h | 11 + drivers/hwtracing/coresight/coresight-tpiu.c | 9 +- drivers/hwtracing/coresight/coresight.c | 53 +++- include/linux/coresight-pmu.h | 2 + include/linux/coresight.h | 8 +- tools/include/linux/coresight-pmu.h | 2 + 13 files changed, 568 insertions(+), 111 deletions(-)
-- 2.17.1