On Wed, 8 Aug 2018 at 01:59, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hi Mathieu,
It's been a while but I am back to Coresight.
Let me remind my setup and the issue I am struggling with now.
Kernel baseline: https://github.com/Linaro/perf-opencsd (perf-opencsd-v4.16) OpenCSD: https://github.com/Linaro/OpenCSD.git (master)
The simplest Coresight components path I used as a start point: ETMv4.1 -> TDR -> FUNNEL -> ETF
As I mentioned TDR is built by Cavium and it was added to aggregate 128 inputs into one output rather than cascading funnels. TDR has its own driver just to keep path connected in Linux Coresight framework.
Here is how I catch some trace data: sudo perf record -C 0 -e cs_etm/@etf0/ --per-thread test_app
The above command line tells perf to trace everything that is happening on CPU0 for as long as "test_app" is executing. In this case the "--per-thread" option is ignored. This is called a CPU-wide trace scenario and is currently not supported for CS (I am currently working on it).
If you want to make sure "test_app" executes on CPU0 and that you trace just that you will need to use the "taskset" utility:
sudo perf record -e cs_etm/@etf0/ --per-thread taskset 0x1 test_app
An alternative to the above would be to CPU-hotplug out CPU128-255 while you are testing.
Let's start with that before going further.
Thanks, Mathieu
I need to use -C because my machines has 2 nodes, 32 cores (128 threads) each and each node has different ETF. So I have to specify which CPU is the source and for specified ETF sink (EFT0 can be a sink for CPU0-CPU127, ETF1 can be a sink for CPU128-CPU255). Otherwise Linux cannot find path for ETMs related to CPU128-CPU255 if I specify ETF0 as a sink.
Overall, I can see some data using: # sudo perf report --stdio --dump [...] . ... CoreSight ETM Trace data: size 16384 bytes Frame deformatter: Found 4 FSYNCS ID:12 RESET operation on trace decode path Idx:108; ID:12; I_NOT_SYNC : I Stream not synchronised Idx:455; ID:12; I_ASYNC : Alignment Synchronisation. Idx:468; ID:12; I_TRACE_INFO : Trace Info.; INFO=0x0 Idx:470; ID:12; I_TRACE_ON : Trace On. Idx:471; ID:12; I_CTXT : Context Packet.; Ctxt: AArch64,EL0, NS; Idx:473; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000AAABE0B09584; Idx:483; ID:12; I_ATOM_F1 : Atom format 1.; N Idx:484; ID:12; I_TIMESTAMP : Timestamp.; Updated val = 0x1b6a5d937cc1 Idx:492; ID:12; I_ATOM_F3 : Atom format 3.; NNE Idx:493; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000AAABE0B0D210; Idx:504; ID:12; I_ATOM_F3 : Atom format 3.; NEE Idx:505; ID:12; I_ATOM_F3 : Atom format 3.; NEN Idx:506; ID:12; I_ATOM_F6 : Atom format 6.; EEEN Idx:507; ID:12; I_ATOM_F3 : Atom format 3.; NNE Idx:508; ID:12; I_ATOM_F1 : Atom format 1.; N Idx:509; ID:12; I_ATOM_F3 : Atom format 3.; NNN Idx:510; ID:12; I_ATOM_F3 : Atom format 3.; EEN Idx:512; ID:12; I_ATOM_F1 : Atom format 1.; E [...]
However, I still see errors while using: # sudo perf report --stdio 0x1e8 [0x60]: failed to process type: 1 Error: failed to process sample # To display the perf.data header info, please use --header/--header-only options.
The reason is that cs_etm__process_event() is failing on: if (!etm->timeless_decoding) return -EINVAL;
and etm->timeless_decoding is setup in cs_etm__is_timeless_decoding(). For some events time bit set and so far I failed to figure out what is going on. Have you met similar issue so far? Any pointers or hints are very appreciated.
One more comment below.
On 10.01.2018 21:10, Mathieu Poirier wrote:
On 10 January 2018 at 06:57, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hello Mathieu,
Thank you for your response. Please see comments below.
On 08.01.2018 17:53, Mathieu Poirier wrote:
Good day Tomasz,
On 5 January 2018 at 05:51, tn Tomasz.Nowicki@caviumnetworks.com wrote:
Hi Mathieu,
I am bringing up Coresight functiproject zeroonality on ThunderX2. While ramping up I come across your Connect session:
which I found very helpful.
Perfect - a few things have changed since then, see below.
During my research I had to create new Coresight component driver for Linux, here is the story. For ThunderX2, we aggregate data trace from all 128 ETMs into one funnel inport using so called TDR (Trace Data Ring) component. This should be transparent to software and does not require configuration at all. However, Linux Coresight framework requires components to be connected each other so we cannot leave funnel and ETMs disconnected in DT. I decided to create pure software component i.e. TDR which is meant to connect chain only, no actions on registers.
Is this TDR an ARM IP or built in-house by Cavium?
This is Cavium specific component which I am going to upstream once I test the whole functionality.
And I suppose it
was added there to aggregate 128 input into one output rather than cascading funnels?
Correct.
Now I am able to enable ETF sink and path from ETM via TDR via FUNNEL up to ETF and gather some data. To be sure things work properly I want to decode data using Linaro OpenCSD library following instructions from here:
https://community.arm.com/tools/b/blog/posts/do-a-coresight-trace-on-linux-w...
Thanks for pointing this out, I didn't know about it.
but still got error while doing 'perf report' step. Kernel perf tool support for OpenCSD is out of tree for now so I may miss some patches.
Can you get me a pastebin of the errors you're getting?
Sure, see: https://pastebin.com/6YDq8KfC As you see there is not much info about error cause.
Here is my setup: https://github.com/Linaro/perf-opencsd/commits/upstream-v1 (+ ThunderX2 specific patches)
Oh boy... I wasn't expecting people to use that but I suppose it is the right thing to do. Keep going with that code.
This, in combination with the upstream-v1 branch should work properly. That's how I test things on my Juno and Dragon board.
# echo 1 > etf0/enable_sink # perf record -C 0 -e cs_etm// sleep 2
Ok, that won't work as the -C option is currently not supported (I am working on it). I also suggest to make sure you have the very latest TIP [1] on branch [2] and to carefully read the README.md. We recently updated the instructions to fit the newest development. Lastly we have deprecated enabling the sink from the sysFS interface - it can still work but no guarantees are provided. It is better to specify the sink as part of the perf record command line, as shown in the most recent HOWTO.md.
I am able to specify sink as part of the perf record command line only for Linux Perf master branch: https://github.com/Linaro/perf-opencsd/commits/master
For upstream-v1 branch I am getting: $ perf record -vvv -e cs_etm/@etf0/ --per-thread uname Using CPUID 0x00000000420f5160 perf: util/evsel.c:783: apply_config_terms: Assertion `!(1)' failed. Aborted (core dumped)
Ok, I've uploaded upstream-v2. With that branch everything works fine on my side, no changes needed. I added a fix for a regression in the perf tip tree and the code required to use the ETR from the perf interface.
One thing about the above: "@etf0". Is this really the name you gave to the device in the DT? Look under /sys/bus/coresight/devices/ for an etf entry. What is listed there should is the name of the ETF as it is known to the system.
Indeed, the name is different but for perf command clarity I use shortcut.
Thanks, Tomasz
On 13.08.2018 18:47, Mathieu Poirier wrote:
On Wed, 8 Aug 2018 at 01:59, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hi Mathieu,
It's been a while but I am back to Coresight.
Let me remind my setup and the issue I am struggling with now.
Kernel baseline: https://github.com/Linaro/perf-opencsd (perf-opencsd-v4.16) OpenCSD: https://github.com/Linaro/OpenCSD.git (master)
The simplest Coresight components path I used as a start point: ETMv4.1 -> TDR -> FUNNEL -> ETF
As I mentioned TDR is built by Cavium and it was added to aggregate 128 inputs into one output rather than cascading funnels. TDR has its own driver just to keep path connected in Linux Coresight framework.
Here is how I catch some trace data: sudo perf record -C 0 -e cs_etm/@etf0/ --per-thread test_app
The above command line tells perf to trace everything that is happening on CPU0 for as long as "test_app" is executing. In this case the "--per-thread" option is ignored. This is called a CPU-wide trace scenario and is currently not supported for CS (I am currently working on it).
If you want to make sure "test_app" executes on CPU0 and that you trace just that you will need to use the "taskset" utility:
sudo perf record -e cs_etm/@etf0/ --per-thread taskset 0x1 test_app
My apologise, I used 'taskset' part in my command but forgot to mention. Anyway, thank you for clarifying that CPU-wide trace scenario is not supported for now.
An alternative to the above would be to CPU-hotplug out CPU128-255 while you are testing.
Yes, this helped to setup path between source and sink. Finally I am able to decode data with: # sudo perf record -e cs_etm/@etf0/ --per-thread test_app # sudo perf report --stdio
Now I am trying to see exactly the path a processor took through the code according to these instructions: https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md#trace-decoding-with-p... However, on perf-opencsd 'perf-opencsd-v4.16' branch there is no 'cs-trace-disasm.py' script. Are you planning to add this script to mainline kernel or this was just for development purpose?
Thanks, Tomasz
Let's start with that before going further.
Thanks, Mathieu
I need to use -C because my machines has 2 nodes, 32 cores (128 threads) each and each node has different ETF. So I have to specify which CPU is the source and for specified ETF sink (EFT0 can be a sink for CPU0-CPU127, ETF1 can be a sink for CPU128-CPU255). Otherwise Linux cannot find path for ETMs related to CPU128-CPU255 if I specify ETF0 as a sink.
Overall, I can see some data using: # sudo perf report --stdio --dump [...] . ... CoreSight ETM Trace data: size 16384 bytes Frame deformatter: Found 4 FSYNCS ID:12 RESET operation on trace decode path Idx:108; ID:12; I_NOT_SYNC : I Stream not synchronised Idx:455; ID:12; I_ASYNC : Alignment Synchronisation. Idx:468; ID:12; I_TRACE_INFO : Trace Info.; INFO=0x0 Idx:470; ID:12; I_TRACE_ON : Trace On. Idx:471; ID:12; I_CTXT : Context Packet.; Ctxt: AArch64,EL0, NS; Idx:473; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000AAABE0B09584; Idx:483; ID:12; I_ATOM_F1 : Atom format 1.; N Idx:484; ID:12; I_TIMESTAMP : Timestamp.; Updated val = 0x1b6a5d937cc1 Idx:492; ID:12; I_ATOM_F3 : Atom format 3.; NNE Idx:493; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000AAABE0B0D210; Idx:504; ID:12; I_ATOM_F3 : Atom format 3.; NEE Idx:505; ID:12; I_ATOM_F3 : Atom format 3.; NEN Idx:506; ID:12; I_ATOM_F6 : Atom format 6.; EEEN Idx:507; ID:12; I_ATOM_F3 : Atom format 3.; NNE Idx:508; ID:12; I_ATOM_F1 : Atom format 1.; N Idx:509; ID:12; I_ATOM_F3 : Atom format 3.; NNN Idx:510; ID:12; I_ATOM_F3 : Atom format 3.; EEN Idx:512; ID:12; I_ATOM_F1 : Atom format 1.; E [...]
However, I still see errors while using: # sudo perf report --stdio 0x1e8 [0x60]: failed to process type: 1 Error: failed to process sample # To display the perf.data header info, please use --header/--header-only options.
The reason is that cs_etm__process_event() is failing on: if (!etm->timeless_decoding) return -EINVAL;
and etm->timeless_decoding is setup in cs_etm__is_timeless_decoding(). For some events time bit set and so far I failed to figure out what is going on. Have you met similar issue so far? Any pointers or hints are very appreciated.
One more comment below.
On 10.01.2018 21:10, Mathieu Poirier wrote:
On 10 January 2018 at 06:57, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hello Mathieu,
Thank you for your response. Please see comments below.
On 08.01.2018 17:53, Mathieu Poirier wrote:
Good day Tomasz,
On 5 January 2018 at 05:51, tn Tomasz.Nowicki@caviumnetworks.com wrote:
Hi Mathieu,
I am bringing up Coresight functiproject zeroonality on ThunderX2. While ramping up I come across your Connect session:
which I found very helpful.
Perfect - a few things have changed since then, see below.
During my research I had to create new Coresight component driver for Linux, here is the story. For ThunderX2, we aggregate data trace from all 128 ETMs into one funnel inport using so called TDR (Trace Data Ring) component. This should be transparent to software and does not require configuration at all. However, Linux Coresight framework requires components to be connected each other so we cannot leave funnel and ETMs disconnected in DT. I decided to create pure software component i.e. TDR which is meant to connect chain only, no actions on registers.
Is this TDR an ARM IP or built in-house by Cavium?
This is Cavium specific component which I am going to upstream once I test the whole functionality.
And I suppose it
was added there to aggregate 128 input into one output rather than cascading funnels?
Correct.
Now I am able to enable ETF sink and path from ETM via TDR via FUNNEL up to ETF and gather some data. To be sure things work properly I want to decode data using Linaro OpenCSD library following instructions from here:
https://community.arm.com/tools/b/blog/posts/do-a-coresight-trace-on-linux-w...
Thanks for pointing this out, I didn't know about it.
but still got error while doing 'perf report' step. Kernel perf tool support for OpenCSD is out of tree for now so I may miss some patches.
Can you get me a pastebin of the errors you're getting?
Sure, see: https://pastebin.com/6YDq8KfC As you see there is not much info about error cause.
Here is my setup: https://github.com/Linaro/perf-opencsd/commits/upstream-v1 (+ ThunderX2 specific patches)
Oh boy... I wasn't expecting people to use that but I suppose it is the right thing to do. Keep going with that code.
This, in combination with the upstream-v1 branch should work properly. That's how I test things on my Juno and Dragon board.
# echo 1 > etf0/enable_sink # perf record -C 0 -e cs_etm// sleep 2
Ok, that won't work as the -C option is currently not supported (I am working on it). I also suggest to make sure you have the very latest TIP [1] on branch [2] and to carefully read the README.md. We recently updated the instructions to fit the newest development. Lastly we have deprecated enabling the sink from the sysFS interface - it can still work but no guarantees are provided. It is better to specify the sink as part of the perf record command line, as shown in the most recent HOWTO.md.
I am able to specify sink as part of the perf record command line only for Linux Perf master branch: https://github.com/Linaro/perf-opencsd/commits/master
For upstream-v1 branch I am getting: $ perf record -vvv -e cs_etm/@etf0/ --per-thread uname Using CPUID 0x00000000420f5160 perf: util/evsel.c:783: apply_config_terms: Assertion `!(1)' failed. Aborted (core dumped)
Ok, I've uploaded upstream-v2. With that branch everything works fine on my side, no changes needed. I added a fix for a regression in the perf tip tree and the code required to use the ETR from the perf interface.
One thing about the above: "@etf0". Is this really the name you gave to the device in the DT? Look under /sys/bus/coresight/devices/ for an etf entry. What is listed there should is the name of the ETF as it is known to the system.
Indeed, the name is different but for perf command clarity I use shortcut.
Thanks, Tomasz
Good morning Tomasz,
On Tue, 14 Aug 2018 at 09:20, tn Tomasz.Nowicki@caviumnetworks.com wrote:
On 13.08.2018 18:47, Mathieu Poirier wrote:
On Wed, 8 Aug 2018 at 01:59, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hi Mathieu,
It's been a while but I am back to Coresight.
Let me remind my setup and the issue I am struggling with now.
Kernel baseline: https://github.com/Linaro/perf-opencsd (perf-opencsd-v4.16) OpenCSD: https://github.com/Linaro/OpenCSD.git (master)
The simplest Coresight components path I used as a start point: ETMv4.1 -> TDR -> FUNNEL -> ETF
As I mentioned TDR is built by Cavium and it was added to aggregate 128 inputs into one output rather than cascading funnels. TDR has its own driver just to keep path connected in Linux Coresight framework.
Here is how I catch some trace data: sudo perf record -C 0 -e cs_etm/@etf0/ --per-thread test_app
The above command line tells perf to trace everything that is happening on CPU0 for as long as "test_app" is executing. In this case the "--per-thread" option is ignored. This is called a CPU-wide trace scenario and is currently not supported for CS (I am currently working on it).
If you want to make sure "test_app" executes on CPU0 and that you trace just that you will need to use the "taskset" utility:
sudo perf record -e cs_etm/@etf0/ --per-thread taskset 0x1 test_app
My apologise, I used 'taskset' part in my command but forgot to mention. Anyway, thank you for clarifying that CPU-wide trace scenario is not supported for now.
An alternative to the above would be to CPU-hotplug out CPU128-255 while you are testing.
Yes, this helped to setup path between source and sink. Finally I am able to decode data with: # sudo perf record -e cs_etm/@etf0/ --per-thread test_app # sudo perf report --stdio
Perfect.
Now I am trying to see exactly the path a processor took through the code according to these instructions: https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md#trace-decoding-with-p... However, on perf-opencsd 'perf-opencsd-v4.16' branch there is no 'cs-trace-disasm.py' script. Are you planning to add this script to mainline kernel or this was just for development purpose?
That script was provided to show what can be done with the results and was never intended to be used by a broad audience. At one point it became too difficult to maintain it and it was dropped from the patchset. You can still find it on tag "perf-opencsd-v4.15" but you'll have to make modifications so that it can work again.
Leo Yan (who is on this list) was working on upstreaming a better version of the script. That was put on hold while I am trying to finish the support of CPU-wide trace scenarios.
Regards, Mathieu
Thanks, Tomasz
Let's start with that before going further.
Thanks, Mathieu
I need to use -C because my machines has 2 nodes, 32 cores (128 threads) each and each node has different ETF. So I have to specify which CPU is the source and for specified ETF sink (EFT0 can be a sink for CPU0-CPU127, ETF1 can be a sink for CPU128-CPU255). Otherwise Linux cannot find path for ETMs related to CPU128-CPU255 if I specify ETF0 as a sink.
Overall, I can see some data using: # sudo perf report --stdio --dump [...] . ... CoreSight ETM Trace data: size 16384 bytes Frame deformatter: Found 4 FSYNCS ID:12 RESET operation on trace decode path Idx:108; ID:12; I_NOT_SYNC : I Stream not synchronised Idx:455; ID:12; I_ASYNC : Alignment Synchronisation. Idx:468; ID:12; I_TRACE_INFO : Trace Info.; INFO=0x0 Idx:470; ID:12; I_TRACE_ON : Trace On. Idx:471; ID:12; I_CTXT : Context Packet.; Ctxt: AArch64,EL0, NS; Idx:473; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000AAABE0B09584; Idx:483; ID:12; I_ATOM_F1 : Atom format 1.; N Idx:484; ID:12; I_TIMESTAMP : Timestamp.; Updated val = 0x1b6a5d937cc1 Idx:492; ID:12; I_ATOM_F3 : Atom format 3.; NNE Idx:493; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000AAABE0B0D210; Idx:504; ID:12; I_ATOM_F3 : Atom format 3.; NEE Idx:505; ID:12; I_ATOM_F3 : Atom format 3.; NEN Idx:506; ID:12; I_ATOM_F6 : Atom format 6.; EEEN Idx:507; ID:12; I_ATOM_F3 : Atom format 3.; NNE Idx:508; ID:12; I_ATOM_F1 : Atom format 1.; N Idx:509; ID:12; I_ATOM_F3 : Atom format 3.; NNN Idx:510; ID:12; I_ATOM_F3 : Atom format 3.; EEN Idx:512; ID:12; I_ATOM_F1 : Atom format 1.; E [...]
However, I still see errors while using: # sudo perf report --stdio 0x1e8 [0x60]: failed to process type: 1 Error: failed to process sample # To display the perf.data header info, please use --header/--header-only options.
The reason is that cs_etm__process_event() is failing on: if (!etm->timeless_decoding) return -EINVAL;
and etm->timeless_decoding is setup in cs_etm__is_timeless_decoding(). For some events time bit set and so far I failed to figure out what is going on. Have you met similar issue so far? Any pointers or hints are very appreciated.
One more comment below.
On 10.01.2018 21:10, Mathieu Poirier wrote:
On 10 January 2018 at 06:57, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hello Mathieu,
Thank you for your response. Please see comments below.
On 08.01.2018 17:53, Mathieu Poirier wrote:
Good day Tomasz,
On 5 January 2018 at 05:51, tn Tomasz.Nowicki@caviumnetworks.com wrote: > > Hi Mathieu, > > I am bringing up Coresight functiproject zeroonality on ThunderX2. While > ramping up I > come across your Connect session: > > which I found very helpful.
Perfect - a few things have changed since then, see below.
> > During my research I had to create new Coresight component driver for > Linux, > here is the story. For ThunderX2, we aggregate data trace from all 128 > ETMs > into one funnel inport using so called TDR (Trace Data Ring) component. > This > should be transparent to software and does not require configuration at > all. > However, Linux Coresight framework requires components to be connected > each > other so we cannot leave funnel and ETMs disconnected in DT. I decided to > create pure software component i.e. TDR which is meant to connect chain > only, no actions on registers.
Is this TDR an ARM IP or built in-house by Cavium?
This is Cavium specific component which I am going to upstream once I test the whole functionality.
And I suppose it
was added there to aggregate 128 input into one output rather than cascading funnels?
Correct.
> > Now I am able to enable ETF sink and path from ETM via TDR via FUNNEL up > to > ETF and gather some data. To be sure things work properly I want to > decode > data using Linaro OpenCSD library following instructions from here: > > https://community.arm.com/tools/b/blog/posts/do-a-coresight-trace-on-linux-w...
Thanks for pointing this out, I didn't know about it.
> but still got error while doing 'perf report' step. Kernel perf tool > support > for OpenCSD is out of tree for now so I may miss some patches.
Can you get me a pastebin of the errors you're getting?
Sure, see: https://pastebin.com/6YDq8KfC As you see there is not much info about error cause.
> > Here is my setup: > https://github.com/Linaro/perf-opencsd/commits/upstream-v1 (+ ThunderX2 > specific patches)
Oh boy... I wasn't expecting people to use that but I suppose it is the right thing to do. Keep going with that code.
> https://github.com/Linaro/OpenCSD/commits/master
This, in combination with the upstream-v1 branch should work properly. That's how I test things on my Juno and Dragon board.
> > # echo 1 > etf0/enable_sink > # perf record -C 0 -e cs_etm// sleep 2
Ok, that won't work as the -C option is currently not supported (I am working on it). I also suggest to make sure you have the very latest TIP [1] on branch [2] and to carefully read the README.md. We recently updated the instructions to fit the newest development. Lastly we have deprecated enabling the sink from the sysFS interface - it can still work but no guarantees are provided. It is better to specify the sink as part of the perf record command line, as shown in the most recent HOWTO.md.
I am able to specify sink as part of the perf record command line only for Linux Perf master branch: https://github.com/Linaro/perf-opencsd/commits/master
For upstream-v1 branch I am getting: $ perf record -vvv -e cs_etm/@etf0/ --per-thread uname Using CPUID 0x00000000420f5160 perf: util/evsel.c:783: apply_config_terms: Assertion `!(1)' failed. Aborted (core dumped)
Ok, I've uploaded upstream-v2. With that branch everything works fine on my side, no changes needed. I added a fix for a regression in the perf tip tree and the code required to use the ETR from the perf interface.
One thing about the above: "@etf0". Is this really the name you gave to the device in the DT? Look under /sys/bus/coresight/devices/ for an etf entry. What is listed there should is the name of the ETF as it is known to the system.
Indeed, the name is different but for perf command clarity I use shortcut.
Thanks, Tomasz
Hello Mathieu,
On 14.08.2018 18:08, Mathieu Poirier wrote:
Good morning Tomasz,
On Tue, 14 Aug 2018 at 09:20, tn Tomasz.Nowicki@caviumnetworks.com wrote:
On 13.08.2018 18:47, Mathieu Poirier wrote:
On Wed, 8 Aug 2018 at 01:59, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hi Mathieu,
It's been a while but I am back to Coresight.
Let me remind my setup and the issue I am struggling with now.
Kernel baseline: https://github.com/Linaro/perf-opencsd (perf-opencsd-v4.16) OpenCSD: https://github.com/Linaro/OpenCSD.git (master)
The simplest Coresight components path I used as a start point: ETMv4.1 -> TDR -> FUNNEL -> ETF
As I mentioned TDR is built by Cavium and it was added to aggregate 128 inputs into one output rather than cascading funnels. TDR has its own driver just to keep path connected in Linux Coresight framework.
Here is how I catch some trace data: sudo perf record -C 0 -e cs_etm/@etf0/ --per-thread test_app
The above command line tells perf to trace everything that is happening on CPU0 for as long as "test_app" is executing. In this case the "--per-thread" option is ignored. This is called a CPU-wide trace scenario and is currently not supported for CS (I am currently working on it).
If you want to make sure "test_app" executes on CPU0 and that you trace just that you will need to use the "taskset" utility:
sudo perf record -e cs_etm/@etf0/ --per-thread taskset 0x1 test_app
My apologise, I used 'taskset' part in my command but forgot to mention. Anyway, thank you for clarifying that CPU-wide trace scenario is not supported for now.
An alternative to the above would be to CPU-hotplug out CPU128-255 while you are testing.
Yes, this helped to setup path between source and sink. Finally I am able to decode data with: # sudo perf record -e cs_etm/@etf0/ --per-thread test_app # sudo perf report --stdio
Perfect.
Above command traces everything related to test_app and all I can see in 'perf report' are userspace symbols. Is it possible to trace kernel side? I have tried to narrow down data using '-e cs_etm/@etf0/k' filter but got nothing in 'perf report'.
Now I am trying to see exactly the path a processor took through the code according to these instructions: https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md#trace-decoding-with-p... However, on perf-opencsd 'perf-opencsd-v4.16' branch there is no 'cs-trace-disasm.py' script. Are you planning to add this script to mainline kernel or this was just for development purpose?
That script was provided to show what can be done with the results and was never intended to be used by a broad audience. At one point it became too difficult to maintain it and it was dropped from the patchset. You can still find it on tag "perf-opencsd-v4.15" but you'll have to make modifications so that it can work again.
OK, I will try to port it on v4.16.
Leo Yan (who is on this list) was working on upstreaming a better version of the script. That was put on hold while I am trying to finish the support of CPU-wide trace scenarios.
Hi Leo, regarding your script, do you have any working version? If yes, and you willing to share, I can give it a try on my machine.
Thanks, Tomasz
On Thu, 16 Aug 2018 at 08:04, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hello Mathieu,
On 14.08.2018 18:08, Mathieu Poirier wrote:
Good morning Tomasz,
On Tue, 14 Aug 2018 at 09:20, tn Tomasz.Nowicki@caviumnetworks.com wrote:
On 13.08.2018 18:47, Mathieu Poirier wrote:
On Wed, 8 Aug 2018 at 01:59, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hi Mathieu,
It's been a while but I am back to Coresight.
Let me remind my setup and the issue I am struggling with now.
Kernel baseline: https://github.com/Linaro/perf-opencsd (perf-opencsd-v4.16) OpenCSD: https://github.com/Linaro/OpenCSD.git (master)
The simplest Coresight components path I used as a start point: ETMv4.1 -> TDR -> FUNNEL -> ETF
As I mentioned TDR is built by Cavium and it was added to aggregate 128 inputs into one output rather than cascading funnels. TDR has its own driver just to keep path connected in Linux Coresight framework.
Here is how I catch some trace data: sudo perf record -C 0 -e cs_etm/@etf0/ --per-thread test_app
The above command line tells perf to trace everything that is happening on CPU0 for as long as "test_app" is executing. In this case the "--per-thread" option is ignored. This is called a CPU-wide trace scenario and is currently not supported for CS (I am currently working on it).
If you want to make sure "test_app" executes on CPU0 and that you trace just that you will need to use the "taskset" utility:
sudo perf record -e cs_etm/@etf0/ --per-thread taskset 0x1 test_app
My apologise, I used 'taskset' part in my command but forgot to mention. Anyway, thank you for clarifying that CPU-wide trace scenario is not supported for now.
An alternative to the above would be to CPU-hotplug out CPU128-255 while you are testing.
Yes, this helped to setup path between source and sink. Finally I am able to decode data with: # sudo perf record -e cs_etm/@etf0/ --per-thread test_app # sudo perf report --stdio
Perfect.
Above command traces everything related to test_app and all I can see in 'perf report' are userspace symbols. Is it possible to trace kernel side? I have tried to narrow down data using '-e cs_etm/@etf0/k' filter but got nothing in 'perf report'.
The ETF buffer is a pretty small one. As such it may be wrapping around and all you get is the user space part. Do you have an ETR on the platform? If so I suggest to download the latest coresight next branch [1] and try with a bigger buffer.
Now I am trying to see exactly the path a processor took through the code according to these instructions: https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md#trace-decoding-with-p... However, on perf-opencsd 'perf-opencsd-v4.16' branch there is no 'cs-trace-disasm.py' script. Are you planning to add this script to mainline kernel or this was just for development purpose?
That script was provided to show what can be done with the results and was never intended to be used by a broad audience. At one point it became too difficult to maintain it and it was dropped from the patchset. You can still find it on tag "perf-opencsd-v4.15" but you'll have to make modifications so that it can work again.
OK, I will try to port it on v4.16.
Right now everything that was on perf-opencsd-v4.16 has been mailined or is available in coresight next [1]. Please use that so that we can help you better.
Thanks, Mathieu
[1]. https://git.linaro.org/kernel/coresight.git/?h=next-v4.19-rc1
Leo Yan (who is on this list) was working on upstreaming a better version of the script. That was put on hold while I am trying to finish the support of CPU-wide trace scenarios.
Hi Leo, regarding your script, do you have any working version? If yes, and you willing to share, I can give it a try on my machine.
Thanks, Tomasz
Hi Mathieu,
On 16.08.2018 21:17, Mathieu Poirier wrote:
On Thu, 16 Aug 2018 at 08:04, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hello Mathieu,
On 14.08.2018 18:08, Mathieu Poirier wrote:
Good morning Tomasz,
On Tue, 14 Aug 2018 at 09:20, tn Tomasz.Nowicki@caviumnetworks.com wrote:
On 13.08.2018 18:47, Mathieu Poirier wrote:
On Wed, 8 Aug 2018 at 01:59, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hi Mathieu,
It's been a while but I am back to Coresight.
Let me remind my setup and the issue I am struggling with now.
Kernel baseline: https://github.com/Linaro/perf-opencsd (perf-opencsd-v4.16) OpenCSD: https://github.com/Linaro/OpenCSD.git (master)
The simplest Coresight components path I used as a start point: ETMv4.1 -> TDR -> FUNNEL -> ETF
As I mentioned TDR is built by Cavium and it was added to aggregate 128 inputs into one output rather than cascading funnels. TDR has its own driver just to keep path connected in Linux Coresight framework.
Here is how I catch some trace data: sudo perf record -C 0 -e cs_etm/@etf0/ --per-thread test_app
The above command line tells perf to trace everything that is happening on CPU0 for as long as "test_app" is executing. In this case the "--per-thread" option is ignored. This is called a CPU-wide trace scenario and is currently not supported for CS (I am currently working on it).
If you want to make sure "test_app" executes on CPU0 and that you trace just that you will need to use the "taskset" utility:
sudo perf record -e cs_etm/@etf0/ --per-thread taskset 0x1 test_app
My apologise, I used 'taskset' part in my command but forgot to mention. Anyway, thank you for clarifying that CPU-wide trace scenario is not supported for now.
An alternative to the above would be to CPU-hotplug out CPU128-255 while you are testing.
Yes, this helped to setup path between source and sink. Finally I am able to decode data with: # sudo perf record -e cs_etm/@etf0/ --per-thread test_app # sudo perf report --stdio
Perfect.
Above command traces everything related to test_app and all I can see in 'perf report' are userspace symbols. Is it possible to trace kernel side? I have tried to narrow down data using '-e cs_etm/@etf0/k' filter but got nothing in 'perf report'.
The ETF buffer is a pretty small one. As such it may be wrapping around and all you get is the user space part. Do you have an ETR on the platform? If so I suggest to download the latest coresight next branch [1] and try with a bigger buffer.
Yes, I have tried ETR with the same results. But it turns out ETMv4 driver enables EL1_NS for kernel tracing which is not true for VHE enabled system where kernel is running on EL2_NS. Bellow patch helps to fix this for me:
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f79b0ea85d76..dc22c1d6c7f3 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -28,6 +28,7 @@ #include <linux/pm_runtime.h> #include <asm/sections.h> #include <asm/local.h> +#include <asm/virt.h>
#include "coresight-etm4x.h" #include "coresight-etm-perf.h" @@ -617,13 +618,11 @@ static u64 etm4_get_access_type(struct etmv4_config *config) * Bit[13] Exception level 1 - OS * Bit[14] Exception level 2 - Hypervisor * Bit[15] Never implemented - * - * Always stay away from hypervisor mode. */ - access_type = ETM_EXLEVEL_NS_HYP;
if (config->mode & ETM_MODE_EXCL_KERN) - access_type |= ETM_EXLEVEL_NS_OS; + access_type |= is_kernel_in_hyp_mode ? + ETM_EXLEVEL_NS_HYP : ETM_EXLEVEL_NS_OS;
if (config->mode & ETM_MODE_EXCL_USER) access_type |= ETM_EXLEVEL_NS_APP; @@ -881,7 +880,8 @@ void etm4_config_trace_mode(struct etmv4_config *config)
addr_acc = config->addr_acc[ETM_DEFAULT_ADDR_COMP]; /* clear default config */ - addr_acc &= ~(ETM_EXLEVEL_NS_APP | ETM_EXLEVEL_NS_OS); + addr_acc &= ~(ETM_EXLEVEL_NS_APP | ETM_EXLEVEL_NS_OS | + ETM_EXLEVEL_NS_HYP);
/* * EXLEVEL_NS, bits[15:12] @@ -892,7 +892,8 @@ void etm4_config_trace_mode(struct etmv4_config *config) * Bit[15] Never implemented */ if (mode & ETM_MODE_EXCL_KERN) - addr_acc |= ETM_EXLEVEL_NS_OS; + addr_acc |= is_kernel_in_hyp_mode ? + ETM_EXLEVEL_NS_HYP : ETM_EXLEVEL_NS_OS; else addr_acc |= ETM_EXLEVEL_NS_APP;
Thanks, Tomasz
Hi Tomasz,
[...]
The ETF buffer is a pretty small one. As such it may be wrapping around and all you get is the user space part. Do you have an ETR on the platform? If so I suggest to download the latest coresight next branch [1] and try with a bigger buffer.
Yes, I have tried ETR with the same results. But it turns out ETMv4 driver enables EL1_NS for kernel tracing which is not true for VHE enabled system where kernel is running on EL2_NS. Bellow patch helps to fix this for me:
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f79b0ea85d76..dc22c1d6c7f3 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -28,6 +28,7 @@ #include <linux/pm_runtime.h> #include <asm/sections.h> #include <asm/local.h> +#include <asm/virt.h>
#include "coresight-etm4x.h" #include "coresight-etm-perf.h" @@ -617,13 +618,11 @@ static u64 etm4_get_access_type(struct etmv4_config *config) * Bit[13] Exception level 1 - OS * Bit[14] Exception level 2 - Hypervisor * Bit[15] Never implemented
*
* Always stay away from hypervisor mode. */
access_type = ETM_EXLEVEL_NS_HYP; if (config->mode & ETM_MODE_EXCL_KERN)
access_type |= ETM_EXLEVEL_NS_OS;
access_type |= is_kernel_in_hyp_mode ?
Did you send me code that doesn't compile cleanly?
ETM_EXLEVEL_NS_HYP : ETM_EXLEVEL_NS_OS; if (config->mode & ETM_MODE_EXCL_USER) access_type |= ETM_EXLEVEL_NS_APP;
@@ -881,7 +880,8 @@ void etm4_config_trace_mode(struct etmv4_config *config)
addr_acc = config->addr_acc[ETM_DEFAULT_ADDR_COMP]; /* clear default config */
addr_acc &= ~(ETM_EXLEVEL_NS_APP | ETM_EXLEVEL_NS_OS);
addr_acc &= ~(ETM_EXLEVEL_NS_APP | ETM_EXLEVEL_NS_OS |
ETM_EXLEVEL_NS_HYP); /* * EXLEVEL_NS, bits[15:12]
@@ -892,7 +892,8 @@ void etm4_config_trace_mode(struct etmv4_config *config) * Bit[15] Never implemented */ if (mode & ETM_MODE_EXCL_KERN)
addr_acc |= ETM_EXLEVEL_NS_OS;
addr_acc |= is_kernel_in_hyp_mode ?
ETM_EXLEVEL_NS_HYP : ETM_EXLEVEL_NS_OS; else addr_acc |= ETM_EXLEVEL_NS_APP;
Thanks, Tomasz
On 21.08.2018 18:05, Mathieu Poirier wrote:
Hi Tomasz,
[...]
The ETF buffer is a pretty small one. As such it may be wrapping around and all you get is the user space part. Do you have an ETR on the platform? If so I suggest to download the latest coresight next branch [1] and try with a bigger buffer.
Yes, I have tried ETR with the same results. But it turns out ETMv4 driver enables EL1_NS for kernel tracing which is not true for VHE enabled system where kernel is running on EL2_NS. Bellow patch helps to fix this for me:
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f79b0ea85d76..dc22c1d6c7f3 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -28,6 +28,7 @@ #include <linux/pm_runtime.h> #include <asm/sections.h> #include <asm/local.h> +#include <asm/virt.h>
#include "coresight-etm4x.h" #include "coresight-etm-perf.h" @@ -617,13 +618,11 @@ static u64 etm4_get_access_type(struct etmv4_config *config) * Bit[13] Exception level 1 - OS * Bit[14] Exception level 2 - Hypervisor * Bit[15] Never implemented
*
* Always stay away from hypervisor mode. */
access_type = ETM_EXLEVEL_NS_HYP; if (config->mode & ETM_MODE_EXCL_KERN)
access_type |= ETM_EXLEVEL_NS_OS;
access_type |= is_kernel_in_hyp_mode ?
Did you send me code that doesn't compile cleanly?
My apologies, obviously this is completely wrong. is_kernel_in_hyp_mode() is what it should be.
Tomasz
On Tue, 21 Aug 2018 at 13:11, tn Tomasz.Nowicki@caviumnetworks.com wrote:
On 21.08.2018 18:05, Mathieu Poirier wrote:
Hi Tomasz,
[...]
The ETF buffer is a pretty small one. As such it may be wrapping around and all you get is the user space part. Do you have an ETR on the platform? If so I suggest to download the latest coresight next branch [1] and try with a bigger buffer.
Yes, I have tried ETR with the same results. But it turns out ETMv4 driver enables EL1_NS for kernel tracing which is not true for VHE enabled system where kernel is running on EL2_NS. Bellow patch helps to fix this for me:
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index f79b0ea85d76..dc22c1d6c7f3 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -28,6 +28,7 @@ #include <linux/pm_runtime.h> #include <asm/sections.h> #include <asm/local.h> +#include <asm/virt.h>
#include "coresight-etm4x.h" #include "coresight-etm-perf.h" @@ -617,13 +618,11 @@ static u64 etm4_get_access_type(struct etmv4_config *config) * Bit[13] Exception level 1 - OS * Bit[14] Exception level 2 - Hypervisor * Bit[15] Never implemented
*
* Always stay away from hypervisor mode. */
access_type = ETM_EXLEVEL_NS_HYP; if (config->mode & ETM_MODE_EXCL_KERN)
access_type |= ETM_EXLEVEL_NS_OS;
access_type |= is_kernel_in_hyp_mode ?
Did you send me code that doesn't compile cleanly?
My apologies, obviously this is completely wrong. is_kernel_in_hyp_mode() is what it should be.
Please send your patch to the public mailing list. Script "get_maintainer.pl" is your friend.
Tomasz
Hi Mathieu,
I have been looking into kernel code looking for ETF/ETR sampling frequency. So far I have found two places where we dump buffer's content: - when thread is schedule out - by adding '-F <HZ>' perf option, every time process gets tick but the <HZ> value has no influence to dumping buffer frequency
Does it make sense to control sampling frequency in this case? If yes, then how can we do it? Thanks in advance for your feedback.
Tomasz
On Thu, 6 Sep 2018 at 04:02, Tomasz Nowicki tnowicki@caviumnetworks.com wrote:
Hi Mathieu,
I have been looking into kernel code looking for ETF/ETR sampling frequency. So far I have found two places where we dump buffer's content:
By "sampling frequency" I will deduce that you mean "copy the content of the sink buffer to the perf's ring buffer".
- when thread is schedule out
Correct.
- by adding '-F <HZ>' perf option, every time process gets tick but the
<HZ> value has no influence to dumping buffer frequency
I wasn't aware of that one but in my opinion not realistic for coresight as it would seriously impact system performance. What is needed is an interrupt when the trace buffer is full, something the components currently don't support.
Mathieu
Does it make sense to control sampling frequency in this case? If yes, then how can we do it? Thanks in advance for your feedback.
Tomasz
Hi Tomasz,
On Thu, Aug 16, 2018 at 04:04:46PM +0200, Tomasz Nowicki wrote:
[...]
Now I am trying to see exactly the path a processor took through the code according to these instructions: https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md#trace-decoding-with-p... However, on perf-opencsd 'perf-opencsd-v4.16' branch there is no 'cs-trace-disasm.py' script. Are you planning to add this script to mainline kernel or this was just for development purpose?
That script was provided to show what can be done with the results and was never intended to be used by a broad audience. At one point it became too difficult to maintain it and it was dropped from the patchset. You can still find it on tag "perf-opencsd-v4.15" but you'll have to make modifications so that it can work again.
OK, I will try to port it on v4.16.
Leo Yan (who is on this list) was working on upstreaming a better version of the script. That was put on hold while I am trying to finish the support of CPU-wide trace scenarios.
Hi Leo, regarding your script, do you have any working version? If yes, and you willing to share, I can give it a try on my machine.
Yes, my pleasure. Please note on the mainline kernel there have a dependent patch series for tracing packet handling fixes[1], you need firstly apply the patches firstly. Then you could refer the enclosed three patches for CoreSight trace disassembler with python script.
Thanks, Leo Yan