Re: Perf command line syntax

26 Apr 2017


      Hi Mathieu,
On 26 April 2017 at 15:49, Mathieu Poirier mathieu.poirier@linaro.org wrote:
...
On 25 April 2017 at 13:22, Mike Leach mike.leach@linaro.org wrote:
...
Hi Mathieu,
On Tue, 2017-04-25 at 12:03 -0600, Mathieu Poirier wrote:
...
Good day to all,
A patch sent by Suzuki a few weeks ago [1] unearthed a problem with
how we deal with the "enable_sink" flag in the CS core.  So far we
have been concentrating on system-wide trace scenarios [2] but per-
CPU
[3] scenarios are also valid.  In system-wide mode a single event is
generated by the perf user space and communicated to the kernel.  In
per-CPU mode an event is generated for each CPU present in the system
or specified on the cmd line, and that is where our handling of the
"enable_sink" flag fails (get back to me if you want more details on
that).
My solution is to add the sink definition to the perf_event_attr
structure [4] that gets sent down to the kernel from user
space.  That
way there is no confusion about what sink belongs to what event.  To
do that I will need to have a chat with the guys in the #perf IRC
channel, something I expect to be fairly tedious.
But before moving ahead we need to agree on the syntax we want to
have
in the future.  That way what I do now with the perf folks doesn't
have to be undone in a few months.
For the following I will be using figure 2-9 on page 2-33 in this
document [5].
So far we have been using this syntax:
# perf record -e cs_etm/@20070000.etr/ --per-thread  $COMMAND
This will instruct perf to select the ETR as a sink.  Up to now not
specifying a sink is treated as an error condition since perf doesn't
know what sink to select.
The main goal of writing all this is that I am suggesting to revisit
that.
What I am proposing is that _if_ a sink is omitted on the perf
command
line, the perf infrastructure will pick the _first_ sink it finds
when
doing a walk through of the CS topology.  This is very advantageous
when thinking about the syntax required to support upcoming systems
where we have a one-to-one mapping between source and sink.
This seems like a good solution to me.
...
In such a system specifying sinks for each CPU on the perf command
line simply doesn't scale.  Even on a small system I don't see users
specifying a sink for each CPU.  Since the sink for each CPU will be
the first one found during the walk through, it is implicit that this
sink should be used and doesn't need to be specified explicitly.
It would also allow for the support of topologies like Juno-R1 [5]
where we have a couple of ETF in the middle.  Those are perfectly
valid sinks but right now the current scheme doesn't allow us to use
them.  If we pick the first sink we find along the way we can
automatically support something like this.
Not sure I understand what you mean here - we can use the ETF on Juno
by specifying it on the command line - I was doing the same yesterday.
Have I missed the point here?
With the current solution and using Juno r1/2, only one of the ETF can
be selected as a sink.  So if you select ETF0 in a system wide trace
scenario the initialisation should fail because processors located in
the "small" cluster won't have a sink.  If selecting ETF0 or ETF1 in
conjuction with the --per-thread option works for you, we need to
talk.
Of course, selecting the ETR will definitely work.
OK - this is the unreachable sink problem. Not terribly clear on the
diagram, but on Juno r1/r2 both clusters go to ETF0, STM, system
profiler and SCP trace go to ETF1.
So strictly speaking it would work on Juno r1/r2, but I take the point
that there are likely topologies where things fall over.
...
...
...
I have reflected quite extensively on this and I think it can work.
The only time it can fail is if at some point we we get more than one
sink associated with each tracer.  But how likely is this?
Take care with replicators - in the Juno topology the TPIU and ETR are
effectively the same number of nodes away from the ETM. Without an
intervening ETF then the "first" becomes dependent on the ordering of
exploring the branches on the replicator. This could be handled as an
error if this case is detected - easily solved by actually specifying
the desired sink.
Exactly - in a case like this where all processors have a path to more
than one sink, specifying the sink on the command line is the way to
go.  Where my solution fails is if there was a one-to-one mapping
between a tracer and a replicator, and that replicator was connecting
a TPIU and an ETR.  So for example on a 4 CPU system, we'd have 4
tracers connected to 4 different replicator, each replicator
connecting a set of TPIU and ETR.
From where I stand this would be completely crazy but architecturally
feasible.  And do we even need to care about this kind of eerie corner
case?  I'd like your opinion on that.
Highly unlikely to have multiple ETR/TPIU pairs in that way.
What is more likely is to have a replicator per ETM, with the 'left'
branch going to a per CPU ETR, and the 'right' branch going to a
funnel linked to a single system TPIU.
And again, there is also the possibility of a system ETR and TPIU
being the only sinks, equidistant from the source.
So where two equidistant routes occur, do we need a command line
option that says "choose the nearest on chip", or possibly a "choose
1st etr"?
Mike
...
...
Further - programmable replicators can route different trace IDs to
different sinks. This would be a specialised case - e.g ETM trace to
ETR and main memory, STM trace off chip to an external debugger. This
probably doesn't affect the perf command line specification but may
need to program the links with appropriate trace IDs in future.
The use case where an STM is used in conjuction with the perf command
line is not something I ever tested for, but since a reference count
is kept on each CS component things would very likely work.  If
doesn't not much is needed go get it going.
...
A second implementation detail may be to handle the unreachable sink.
If the user specifies a sink that is not in the path then a suitable
error needs to be output. Not sure what happens now if the STM sink is
specified for Juno r1/r2?
Right now if a sink is unreachable the perf command line will simply
fail.  So it will be the same for the STM.  I also agree that a better
error message could be returned.
Mathieu
...
Regards
Mike
...
What we decide now will not be undone easily, if at all.  Please read
my email a couple of times and give it some consideration.  Comment
and ideas are welcomed.
Best regards,
Mathieu
[1]. https://patchwork.kernel.org/patch/9657141/
[2]. perf record -e cs_etm/@20070000.etr/u --per-thread  $COMMAND
[3]. perf record -e cs_etm/@20070000.etr/u --C 0,2-3  $COMMAND
[4]. http://lxr.free-electrons.com/source/include/uapi/linux/perf_eve
nt.h#L283
[5]. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0515d.b/DDI0
515D_b_juno_arm_development_platform_soc_trm.pdf
_______________________________________________
CoreSight mailing list
CoreSight@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/coresight
-- 
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: Perf command line syntax