On 22/09/2022 11:52, Catalin Marinas wrote:
> On Thu, Sep 22, 2022 at 10:34:45AM +0100, James Clark wrote:
>> On 21/09/2022 16:08, Catalin Marinas wrote:
>>> 2. Always on CONFIG_PID_IN_CONTEXTIDR (we might as well remove the
>>> Kconfig entry). This would write the root pid namespace value
>>> (task_pid_nr()).
>>
>> If we're not worried about the overhead after all, this would be the
>> easiest solution. And then SPE or Coresight already decide whether they
>> want to use the value or not, so no further changes are needed.
>>
>> From Leo's patch there is a table that shows a 1% overhead with it
>> enabled permanently, and I've heard a figure like that mentioned before.
>> So I could also resurrect that patch to use static keys? Although it's a
>> bit more complicated, that would be my preference. And then we can have
>> that mode always on.
>
> I don't think we should bother with static keys, just always enable it
> but try to reduce/group the ISBs from all the functions called on the
> __switch_to() path. We may actually get a speed-up.
>
Ok thanks I will take a look at this
On 21/09/2022 16:08, Catalin Marinas wrote:
> On Wed, Sep 21, 2022 at 03:05:34PM +0100, James Clark wrote:
>> As suggested by Catalin here's the change to add Coresight to defconfig.
>>
>> Unfortunately I don't think we should add CONFIG_CORESIGHT_SOURCE_ETM4X
>> which builds a few files until [1] is merged because of the overhead
>> of CONFIG_PID_IN_CONTEXTIDR.
>>
>> [1]: https://lore.kernel.org/lkml/20211021134530.206216-1-leo.yan@linaro.org/T/
>
> I thought the overhead wasn't the problem, it's mostly negligible. We
> can probably save a few more cycles on the __switch_to() path by
> replacing several isb()s in those functions with a single one just
> before cpu_switch_to().
>
> IIRC the issue is that unless a process runs in the root pid namespace,
> the actual pid written to contextidr is meaningless.
This is true, and Leo has recently disabled it in that scenario in
aab473867fed.
>
> Now that you reminded me of that thread, I see three options (sorry, not
> entirely related to the defconfig updates):
>
> 1. Remove CONFIG_PID_IN_CONTEXTIDR and corresponding code completely,
> find other events to correlate the task with the trace.
Unfortunately when tracing per core we would need kernel timestamps in
the trace to correlate to the switch records. At the moment Coresight is
using a different clock source so it's not possible and we're still
using the context ID to correlate samples.
With FEAT_TRF in v8.4 it will be possible to do this and we've started
working towards that here: 0f00b223ea22. But we'd still have to support
older hardware too, so CONFIG_PID_IN_CONTEXTIDR can't be removed completely.
For SPE it's not required because we already have the right timestamps
in the samples and we've added support for no context IDs in the decoder
here: 27d113cfe892
>
> 2. Always on CONFIG_PID_IN_CONTEXTIDR (we might as well remove the
> Kconfig entry). This would write the root pid namespace value
> (task_pid_nr()).
If we're not worried about the overhead after all, this would be the
easiest solution. And then SPE or Coresight already decide whether they
want to use the value or not, so no further changes are needed.
From Leo's patch there is a table that shows a 1% overhead with it
enabled permanently, and I've heard a figure like that mentioned before.
So I could also resurrect that patch to use static keys? Although it's a
bit more complicated, that would be my preference. And then we can have
that mode always on.
>
> 3. Similar to (2) but instead write task_pid_nr_ns(). An alternative
> here is to write -1 if the task is not in the root pid namespace.
>
> Strong preference for (1).
>
On 21/09/2022 17:46, Mathieu Poirier wrote:
> On Wed, Sep 21, 2022 at 04:26:59PM +0100, Mark Brown wrote:
>> On Wed, Sep 21, 2022 at 03:05:35PM +0100, James Clark wrote:
>>
>>> +CONFIG_CORESIGHT_CTI=m
>>> +CONFIG_CORESIGHT_CTI_INTEGRATION_REGS=y
>>
>
> I agree - integration registers should not be enabled by default.
>
>> Do we want this turned on by default? According to the
>> description it's a bit dangerous and it's exposed via sysfs
>> rather than debugfs.
>
>
Should I disable just CONFIG_CORESIGHT_CTI_INTEGRATION_REGS or
CONFIG_CORESIGHT_CTI as well? There are other writable registers exposed
via sysfs outside of these two options, so I wanted to check if it's
just the integration registers that are the issue.
As suggested by Catalin here's the change to add Coresight to defconfig.
Unfortunately I don't think we should add CONFIG_CORESIGHT_SOURCE_ETM4X
which builds a few files until [1] is merged because of the overhead
of CONFIG_PID_IN_CONTEXTIDR.
[1]: https://lore.kernel.org/lkml/20211021134530.206216-1-leo.yan@linaro.org/T/
James Clark (1):
arm64: defconfig: Add Coresight as module
arch/arm64/configs/defconfig | 9 +++++++++
1 file changed, 9 insertions(+)
--
2.28.0
在 9/13/2022 8:48 AM, Rob Herring 写道:
> On Thu, Sep 08, 2022 at 04:44:57PM +0800, Tao Zhang wrote:
>> Add property "qcom,dsb-elem-size" to support DSB element for TPDA.
>> Specifies the DSB element size supported by each monitor connected
>> to the aggregator on each port. Should be specified in pairs (port,
>> dsb element size).
> What is DSB?
The full name of DSB is "Discrete Single Bit".
The DSB element size supported by different DSB subunit TPDMs is
different, so TPDA needs to be informed through configuration in device
tree.
>> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
>> ---
>> Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml b/Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml
>> index eb9bfc5..1bb3fdf 100644
>> --- a/Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml
>> +++ b/Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml
>> @@ -40,6 +40,13 @@ properties:
>> minItems: 1
>> maxItems: 2
>>
>> + qcom,dsb-elem-size:
>> + description: |
>> + Specifies the DSB element size supported by each monitor
>> + connected to the aggregator on each port. Should be specified
>> + in pairs (port, dsb element size).
>> + $ref: /schemas/types.yaml#/definitions/uint32-array
> The binding (not yet upstream) says there is just 1 port (port 0). So
> why do you need more than a single uint32?
>
> Rob
TPDA(Trace, Profiling and Diagnostics Aggregator) is to provide
packetization, funneling and timestamping of TPDM data.
Multiple monitors are connected to different input ports of TPDA.
- - - - - - - - - - - -
| TPDM 0| | TPDM 1 | | TPDM 2|
- - - - - - - - - - - -
| | |
|_ _ _ _ _ _ | _ _ _ _ |
| | |
| | |
------------------
| TPDA |
------------------
There may be multiple DSB subunit TPDMs connected to different input
ports of the same TPDA, so we need to use port here to define the
distinction in device tree.
Best regards,
Tao
在 9/8/2022 6:54 PM, Krzysztof Kozlowski 写道:
> On 08/09/2022 10:44, Tao Zhang wrote:
>> Add property "qcom,dsb-elem-size" to support DSB element for TPDA.
>> Specifies the DSB element size supported by each monitor connected
>> to the aggregator on each port. Should be specified in pairs (port,
>> dsb element size).
>>
>> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
>> ---
>> Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml b/Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml
>> index eb9bfc5..1bb3fdf 100644
>> --- a/Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml
>> +++ b/Documentation/devicetree/bindings/arm/qcom,coresight-tpda.yaml
>> @@ -40,6 +40,13 @@ properties:
>> minItems: 1
>> maxItems: 2
>>
>> + qcom,dsb-elem-size:
>> + description: |
>> + Specifies the DSB element size supported by each monitor
>> + connected to the aggregator on each port. Should be specified
>> + in pairs (port, dsb element size).
>> + $ref: /schemas/types.yaml#/definitions/uint32-array
> So it is rather uint32-matrix (need to describe the items subschema).
> What about maxItems?
>
> Best regards,
> Krzysztof
Yes, indeed it should be uint32-matrix here. I will update in the next
release.
The "maxItems" cannot be known explicitly because it depends on how many
DSB subunit TPDMs are connected to the TPDA input ports.
Usually the number of the items is 1 to several, but there is no limit
to its maximum value.
Best regards,
Tao