【questions on coresight integrated with perf】

List overview All Threads
Download

newer

older

[PATCH 00/19] OpenCSD 0v004

[PATCH 0/1] perf cs-etm update for...

liubowen (A)

28 Jul 2016 28 Jul '16

6:53 a.m.

Hi,

Thanks for your time!

I am bob. I am interested in the CoreSight Project. And I get much from the web page http://www.linaro.org/blog/core-dump/coresight-perf-and-the-opencsd-library/.

Because I work on ARM64, there is a bug with perf working on ARM. Specific information from https://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/.

For instance, when we run : dd if=/dev/urandom of=/dev/null, over 90% of the CPU time is spent unlocking interrupts and the cryptographic operations that should dominate the use case are completely hidden. [cid:image003.jpg@01D1E8DF.C2B9E9D0]

The author Daniel Thompson from Linaro comes up with a primary solution, however he suggests it will need further work.

Now, CoreSight can trace program flow only by hardware. If we combine coresight with perf, when we run “dd if=/dev/urandom of=/dev/null” and perf record, will the report be normal? If it is normal, it will be amazing!!! And, I am eager for the related information.

I have followed the documentation to enable coresight and perf, but get stuck. I can not figure out whether it is normal.

I greatly appreciate for your help!!! Thanks again for your time!!!

Attachments:

attachment.html (text/html — 5.3 KB)
image003.jpg (image/jpeg — 23.6 KB)

Show replies by date

Mathieu Poirier

28 Jul 28 Jul

2:38 p.m.

On 28 July 2016 at 00:53, liubowen (A) liubowen2@huawei.com wrote:

...

Hi,

Thanks for your time!

I am bob. I am interested in the CoreSight Project. And I get much from the web page http://www.linaro.org/blog/core-dump/coresight-perf-and-the-opencsd-library/ .

Because I work on ARM64, there is a bug with perf working on ARM. Specific information from https://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/.

For instance, when we run : dd if=/dev/urandom of=/dev/null, over 90% of the CPU time is spent unlocking interrupts and the cryptographic operations that should dominate the

use case are completely hidden.

The author Daniel Thompson from Linaro comes up with a primary solution, however he suggests it will need further work.

Now, CoreSight can trace program flow only by hardware. If we combine coresight with perf, when we run “dd if=/dev/urandom of=/dev/null” and perf record, will the report be normal?

If it is normal, it will be amazing!!! And, I am eager for the related information.

What do you expect to see in a "normal" report?

There is no restriction on the code CoreSight can trace, and with the soon-to-be released address filtering capabilities, knowing exactly what the HW is doing will become a lot easier. The only requirement (for now) is that CPUidle be disabled.

...

I have followed the documentation to enable coresight and perf, but get stuck. I can not figure out whether it is normal.

That is unfortunately the downside to CoreSight. But as every powerful technology, complexity is inherent.

...

I greatly appreciate for your help!!! Thanks again for your time!!!

I am not sure of how I can help you here. Other than the one above (to which I have replied), I don't see any specific questions.

Regards, Mathieu

...

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

liubowen (A)

29 Jul 29 Jul

10:18 a.m.

New subject: //Re: 【questions on coresight integrated with perf】

Hi Mathieu:

Glad to receive your reply. I am sorry to trouble you, and I will present my question again, and Thanks for your time again. As we know, perf can be used to record samples by interrupts made from PMU and show hot spots. One condition as follows:

spin_lock_irq(); A(); B(); spin_unlock_irq();

In order to avoid deadlock, wo replace spin_lock_irq with spin_lock. However, spin_lock_irq() will disable local interrupt by local_irq_disable(). So, the interrupts made from PMU can not be handle util spin_unlock_irq() executes. At this point, the value of the current Instruction Pointer direct “spin_unlock_irq”. The time spent on A() and B() will be treated on spin_unlock_irq(), when we perf report, we can not see the occupation of A() or B(). Therefore the report is abnormal, and the normal report should contain A() and B(). Currently, I do not come up with a wonderful solution. And today, I read the paper “CoreSight, Perf and the OpenCSD Library” once again, I get more. The trace data from ETM will be recorded in the perf.data. When we perf report or script, we can decode the trace data and get hot spots and so on. I wonder whether the trace data is from the start to the end during recording. If the trace data is complete, the abnormal report will be solved perfectly. Indeed, I do not get insight into the solution offered by you guys. So, it is a little hard for me to check. And I hope you can understand what I say and give me some suggestion. Maybe it is a easy question, and beg your pardon.^_^

Okay, thanks very much for your time spenting on my question. And it is my honor to talk with you.

On the other hand, I find something wrong. Such as the web page https://github.com/Linaro/OpenCSD/blob/opencsd-0v002/HOWTO.md#on-target-trac... I git clone the whole project, but find there is no branch named perf-opencsd-4.7-rc1. [cid:image002.jpg@01D1E9C5.93B9C6F0]

So, when I do as follows, I can not get the rc1 branch. Maybe it is the reason why I get stuck. [cid:image005.jpg@01D1E9C5.93B9C6F0]

At last, thanks thanks for your time!!!

Regards Bob

发件人: Mathieu Poirier [mailto:mathieu.poirier@linaro.org] 发送时间: 2016年7月28日 22:38 收件人: liubowen (A) 抄送: coresight@lists.linaro.org; Zhanweitao 主题: Re: 【questions on coresight integrated with perf】

On 28 July 2016 at 00:53, liubowen (A) <liubowen2@huawei.commailto:liubowen2@huawei.com> wrote: Hi,

Thanks for your time!

I am bob. I am interested in the CoreSight Project. And I get much from the web page http://www.linaro.org/blog/core-dump/coresight-perf-and-the-opencsd-library/.

Because I work on ARM64, there is a bug with perf working on ARM. Specific information from https://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/.

The author Daniel Thompson from Linaro comes up with a primary solution, however he suggests it will need further work.

What do you expect to see in a "normal" report?

I have followed the documentation to enable coresight and perf, but get stuck. I can not figure out whether it is normal.

That is unfortunately the downside to CoreSight. But as every powerful technology, complexity is inherent.

I greatly appreciate for your help!!! Thanks again for your time!!!

I am not sure of how I can help you here. Other than the one above (to which I have replied), I don't see any specific questions.

Regards, Mathieu

_______________________________________________ CoreSight mailing list CoreSight@lists.linaro.orgmailto:CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Mathieu Poirier

3:26 p.m.

New subject: //Re: 【questions on coresight integrated with perf】

On 29 July 2016 at 04:18, liubowen (A) liubowen2@huawei.com wrote:

...

Hi Mathieu:
Glad to receive your reply.
I am sorry to trouble you, and I will present my question again, and Thanks for your time again.

As we know, perf can be used to record samples by interrupts made from PMU and show hot spots.

Not all PMUs generate interrupts, and that is exactly the case for CoreSight. The CoreSight PMU simply start trace collection when the process it is associated with is installed on a processor. The recording process stops when the process is yanked out. As such issues with spinlocks as you describe below aren't a problem. The ETMs will trace for as long as the process is executing, regardless of what that execution is.

...

One condition as follows:

spin_lock_irq();

A();

B();

spin_unlock_irq();

In order to avoid deadlock, wo replace spin_lock_irq with spin_lock. However, spin_lock_irq() will disable local interrupt by local_irq_disable().

So, the interrupts made from PMU can not be handle util spin_unlock_irq() executes. At this point, the value of the current Instruction Pointer direct “spin_unlock_irq”.

The time spent on A() and B() will be treated on spin_unlock_irq(), when we perf report, we can not see the occupation of A() or B(). Therefore the report is abnormal, and the normal report should contain A() and B().

Currently, I do not come up with a wonderful solution. And today, I read the paper “CoreSight, Perf and the OpenCSD Library” once again, I get more.

The trace data from ETM will be recorded in the perf.data. When we perf report or script, we can decode the trace data and get hot spots and so on.

I wonder whether the trace data is from the start to the end during recording. If the trace data is complete, the abnormal report will be solved perfectly. Indeed, I do not get insight into the solution offered by you guys. So, it is a little hard for me to check. And I hope you can understand what I say and give me some suggestion. Maybe it is a easy question, and beg your pardon.^_^

...

From your description it is not clear (at least to me) if you have

collected trace data generated by the CoreSight PMU or not.

...

Okay, thanks very much for your time spenting on my question. And it is my honor to talk with you.

On the other hand, I find something wrong. Such as the web page https://github.com/Linaro/OpenCSD/blob/opencsd-0v002/HOWTO.md#on-target-trac...

I suggest you use opencsd-0v003 - it has the latest code and updated documentation.

...

I git clone the whole project, but find there is no branch named perf-opencsd-4.7-rc1.

So, when I do as follows, I can not get the rc1 branch. Maybe it is the reason why I get stuck.

Simply use branch "perf-opencsd-4.7" - it has the same features as "perf-opencsd-4.7-rc1".

Also keep an eye out for the address range filtering feature, allowing one to limit tracing to a very narrow range. I will publish the code in the coming weeks, as soon as I know it has made it to the maintainers' tree.

...

At last, thanks thanks for your time!!!

Regards

Bob

*发件人:* Mathieu Poirier [mailto:mathieu.poirier@linaro.org] *发送时间:* 2016年7月28日 22:38 *收件人:* liubowen (A) *抄送:* coresight@lists.linaro.org; Zhanweitao *主题:* Re: 【questions on coresight integrated with perf】

On 28 July 2016 at 00:53, liubowen (A) liubowen2@huawei.com wrote:

Hi,

Thanks for your time!

I am bob. I am interested in the CoreSight Project. And I get much from the web page http://www.linaro.org/blog/core-dump/coresight-perf-and-the-opencsd-library/ .

Because I work on ARM64, there is a bug with perf working on ARM. Specific information from https://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/.

For instance, when we run : dd if=/dev/urandom of=/dev/null, over 90% of the CPU time is spent unlocking interrupts and the cryptographic operations that should dominate the

use case are completely hidden.

The author Daniel Thompson from Linaro comes up with a primary solution, however he suggests it will need further work.

Now, CoreSight can trace program flow only by hardware. If we combine coresight with perf, when we run “dd if=/dev/urandom of=/dev/null” and perf record, will the report be normal?

If it is normal, it will be amazing!!! And, I am eager for the related information.

What do you expect to see in a "normal" report?

There is no restriction on the code CoreSight can trace, and with the soon-to-be released address filtering capabilities, knowing exactly what the HW is doing will become a lot easier. The only requirement (for now) is that CPUidle be disabled.

I have followed the documentation to enable coresight and perf, but get stuck. I can not figure out whether it is normal.

That is unfortunately the downside to CoreSight. But as every powerful technology, complexity is inherent.

I greatly appreciate for your help!!! Thanks again for your time!!!

I am not sure of how I can help you here. Other than the one above (to which I have replied), I don't see any specific questions.

Regards,

Mathieu

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

liubowen (A)

6 Aug 6 Aug

11:23 a.m.

New subject: //Re: //Re: 【questions on coresight integrated with perf】

Hi Mathieu: I am bob. And thanks for your time. Now, I still get stuck as follows. I follow the two pages https://github.com/Linaro/OpenCSD/blob/opencsd-0v003/HOWTO.md and https://github.com/Linaro/OpenCSD/blob/perf-opencsd-4.7/Documentation/trace/....

The current configuration of coresight I set is, the only one Core 0 can trace messages which come from the module ETMv4, and the ETB is used to store the messages.

First, I make a perf executable on the target board. I get this result. [cid:image008.jpg@01D1F017.F11F9D30]

Second, I get OpenCSD installed and make a new perf binary. I use perf to report the perf.data. I get this result. [cid:image009.jpg@01D1F017.F11F9D30]

I haven’t found the reason. Have you ever known the exception? And thanks for your help.

On the other hand, I download the sample bundle, and it works well. [cid:image010.jpg@01D1F017.F11F9D30] [cid:image011.jpg@01D1F017.F11F9D30] Here, I also have a question. If we do like “-e cs_etm/cycacc,timestamp/ ”, we should get the accurate amount of cycles of each instruction and the timestamp inserted into the trace messages. Now, from the sample bundle, how to get the occupation of one symbol? Is it based on the amount of instructions?

Recently , I find many patches are merged into the project througe emails from coresight. However we cann’t get the new code from Github. Can I try the new code?

And really beg your pardon if I do trouble you.

And thanks for your time!

发件人: Mathieu Poirier [mailto:mathieu.poirier@linaro.org] 发送时间: 2016年7月29日 23:27 收件人: liubowen (A) 抄送: coresight@lists.linaro.org 主题: Re: //Re: 【questions on coresight integrated with perf】

On 29 July 2016 at 04:18, liubowen (A) <liubowen2@huawei.commailto:liubowen2@huawei.com> wrote: Hi Mathieu:

One condition as follows:

spin_lock_irq(); A(); B(); spin_unlock_irq();

From your description it is not clear (at least to me) if you have collected trace data generated by the CoreSight PMU or not.

Okay, thanks very much for your time spenting on my question. And it is my honor to talk with you.

On the other hand, I find something wrong. Such as the web page https://github.com/Linaro/OpenCSD/blob/opencsd-0v002/HOWTO.md#on-target-trac...

I suggest you use opencsd-0v003 - it has the latest code and updated documentation.

I git clone the whole project, but find there is no branch named perf-opencsd-4.7-rc1. [cid:image001.jpg@01D1F009.D7890C70]

So, when I do as follows, I can not get the rc1 branch. Maybe it is the reason why I get stuck. [cid:image002.jpg@01D1F009.D7890C70]

Simply use branch "perf-opencsd-4.7" - it has the same features as "perf-opencsd-4.7-rc1".

At last, thanks thanks for your time!!!

Regards Bob

发件人: Mathieu Poirier [mailto:mathieu.poirier@linaro.orgmailto:mathieu.poirier@linaro.org] 发送时间: 2016年7月28日 22:38 收件人: liubowen (A) 抄送: coresight@lists.linaro.orgmailto:coresight@lists.linaro.org; Zhanweitao 主题: Re: 【questions on coresight integrated with perf】

On 28 July 2016 at 00:53, liubowen (A) <liubowen2@huawei.commailto:liubowen2@huawei.com> wrote: Hi,

Thanks for your time!

I am bob. I am interested in the CoreSight Project. And I get much from the web page http://www.linaro.org/blog/core-dump/coresight-perf-and-the-opencsd-library/.

Because I work on ARM64, there is a bug with perf working on ARM. Specific information from https://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/.

The author Daniel Thompson from Linaro comes up with a primary solution, however he suggests it will need further work.

What do you expect to see in a "normal" report?

I have followed the documentation to enable coresight and perf, but get stuck. I can not figure out whether it is normal.

That is unfortunately the downside to CoreSight. But as every powerful technology, complexity is inherent.

I greatly appreciate for your help!!! Thanks again for your time!!!

I am not sure of how I can help you here. Other than the one above (to which I have replied), I don't see any specific questions.

Regards, Mathieu

_______________________________________________ CoreSight mailing list CoreSight@lists.linaro.orgmailto:CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Chunyan Zhang

8 Aug 8 Aug

6:01 a.m.

New subject: //Re: //Re: 【questions on coresight integrated with perf】

On Sat, Aug 6, 2016 at 7:23 PM, liubowen (A) liubowen2@huawei.com wrote:

...

Hi Mathieu:

I am bob. And thanks for your time. Now, I still get stuck as follows.
     I follow the two pages https://github.com/Linaro/Open
CSD/blob/opencsd-0v003/HOWTO.md and https://github.com/Linaro/Open CSD/blob/perf-opencsd-4.7/Documentation/trace/coresight.txt.

The current configuration of coresight I set is, the only one Core 0 can trace messages which come from the module ETMv4, and the ETB is used to store the messages.

First, I make a perf executable on the target board. I get this result.
     Second, I get OpenCSD installed and make a new perf binary. I use
perf to report the perf.data. I get this result.
     I haven’t found the reason. Have you ever known the exception?
And thanks for your help.

The reason was that 'perf record' didn't record complete all trace data, perhaps because the trace buffer (ETB or something you were using) size is not large enough?

Chunyan

...

     On the other hand, I download the sample bundle, and it works
well.
     Here, I also have a question. If we do like “-e
cs_etm/cycacc,timestamp/ ”, we should get the accurate amount of cycles of each instruction and the timestamp inserted into the trace messages.
     Now, from the sample bundle, how to get the occupation of one
symbol? Is it based on the amount of instructions?
     Recently , I find many patches are merged into the project
througe emails from coresight. However we cann’t get the new code from Github.
     Can I try the new code?



     And really beg your pardon if I do trouble you.
And thanks for your time!

*发件人:* Mathieu Poirier [mailto:mathieu.poirier@linaro.org] *发送时间:* 2016年7月29日 23:27 *收件人:* liubowen (A) *抄送:* coresight@lists.linaro.org *主题:* Re: //Re: 【questions on coresight integrated with perf】

On 29 July 2016 at 04:18, liubowen (A) liubowen2@huawei.com wrote:

Hi Mathieu:
Glad to receive your reply.
I am sorry to trouble you, and I will present my question again, and Thanks for your time again.

As we know, perf can be used to record samples by interrupts made from PMU and show hot spots.

Not all PMUs generate interrupts, and that is exactly the case for CoreSight. The CoreSight PMU simply start trace collection when the process it is associated with is installed on a processor. The recording process stops when the process is yanked out. As such issues with spinlocks as you describe below aren't a problem. The ETMs will trace for as long as the process is executing, regardless of what that execution is.

One condition as follows:

spin_lock_irq();

A();

B();

spin_unlock_irq();

In order to avoid deadlock, wo replace spin_lock_irq with spin_lock. However, spin_lock_irq() will disable local interrupt by local_irq_disable().

So, the interrupts made from PMU can not be handle util spin_unlock_irq() executes. At this point, the value of the current Instruction Pointer direct “spin_unlock_irq”.

The time spent on A() and B() will be treated on spin_unlock_irq(), when we perf report, we can not see the occupation of A() or B(). Therefore the report is abnormal, and the normal report should contain A() and B().

Currently, I do not come up with a wonderful solution. And today, I read the paper “CoreSight, Perf and the OpenCSD Library” once again, I get more.

The trace data from ETM will be recorded in the perf.data. When we perf report or script, we can decode the trace data and get hot spots and so on.

I wonder whether the trace data is from the start to the end during recording. If the trace data is complete, the abnormal report will be solved perfectly. Indeed, I do not get insight into the solution offered by you guys. So, it is a little hard for me to check. And I hope you can understand what I say and give me some suggestion. Maybe it is a easy question, and beg your pardon.^_^

From your description it is not clear (at least to me) if you have collected trace data generated by the CoreSight PMU or not.

Okay, thanks very much for your time spenting on my question. And it is my honor to talk with you.

On the other hand, I find something wrong. Such as the web page https://github.com/Linaro/OpenCSD/blob/opencsd-0v002/HOWTO. md#on-target-trace-collection

I suggest you use opencsd-0v003 - it has the latest code and updated documentation.

I git clone the whole project, but find there is no branch named perf-opencsd-4.7-rc1.

So, when I do as follows, I can not get the rc1 branch. Maybe it is the reason why I get stuck.

Simply use branch "perf-opencsd-4.7" - it has the same features as "perf-opencsd-4.7-rc1".

Also keep an eye out for the address range filtering feature, allowing one to limit tracing to a very narrow range. I will publish the code in the coming weeks, as soon as I know it has made it to the maintainers' tree.

At last, thanks thanks for your time!!!

Regards

Bob

*发件人:* Mathieu Poirier [mailto:mathieu.poirier@linaro.org] *发送时间:* 2016年7月28日 22:38 *收件人:* liubowen (A) *抄送:* coresight@lists.linaro.org; Zhanweitao *主题:* Re: 【questions on coresight integrated with perf】

On 28 July 2016 at 00:53, liubowen (A) liubowen2@huawei.com wrote:

Hi,

Thanks for your time!

I am bob. I am interested in the CoreSight Project. And I get much from the web page http://www.linaro.org/blog/core-dump/coresight-perf-and-the- opencsd-library/.

Because I work on ARM64, there is a bug with perf working on ARM. Specific information from https://www.linaro.org/blog/co re-dump/debugging-arm-kernels-using-nmifiq/.

For instance, when we run : dd if=/dev/urandom of=/dev/null, over 90% of the CPU time is spent unlocking interrupts and the cryptographic operations that should dominate the

use case are completely hidden.

The author Daniel Thompson from Linaro comes up with a primary solution, however he suggests it will need further work.

Now, CoreSight can trace program flow only by hardware. If we combine coresight with perf, when we run “dd if=/dev/urandom of=/dev/null” and perf record, will the report be normal?

If it is normal, it will be amazing!!! And, I am eager for the related information.

What do you expect to see in a "normal" report?

There is no restriction on the code CoreSight can trace, and with the soon-to-be released address filtering capabilities, knowing exactly what the HW is doing will become a lot easier. The only requirement (for now) is that CPUidle be disabled.

I have followed the documentation to enable coresight and perf, but get stuck. I can not figure out whether it is normal.

That is unfortunately the downside to CoreSight. But as every powerful technology, complexity is inherent.

I greatly appreciate for your help!!! Thanks again for your time!!!

I am not sure of how I can help you here. Other than the one above (to which I have replied), I don't see any specific questions.

Regards,

Mathieu

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

Mathieu Poirier

2:36 p.m.

New subject: //Re: //Re: 【questions on coresight integrated with perf】

On 6 August 2016 at 05:23, liubowen (A) liubowen2@huawei.com wrote:

...

Hi Mathieu:

I am bob. And thanks for your time. Now, I still get stuck as follows.
     I follow the two pages https://github.com/Linaro/
OpenCSD/blob/opencsd-0v003/HOWTO.md and https://github.com/Linaro/ OpenCSD/blob/perf-opencsd-4.7/Documentation/trace/coresight.txt.

The current configuration of coresight I set is, the only one Core 0 can trace messages which come from the module ETMv4, and the ETB is used to store the messages.

First, I make a perf executable on the target board. I get this result.

This looks like the right amount of data - this is good. The data lost message is related to the ETB internal memory not being large enough. At some point the circular buffer wrapped around and some traces got lost. To avoid that you can use the 'u' and 'k' option to limit traces to either 'u'ser or 'k'ernel space. Address filtering will also be useful to you but I am still working on this one. It should be out during the 4.9 cycle.

...

     Second, I get OpenCSD installed and make a new perf binary. I use
perf to report the perf.data. I get this result.
     I haven’t found the reason. Have you ever known the exception?
And thanks for your help.

If this error message is related to the event type, the problem would be with PERF_RECORD_MMAP2 events, something I find odd. Unfortunately there isn't much I can do to help - you will have to instrument the code. Below you have proven the working environment to be sane (since you've been able to decode a sample bundle). With a working example it should be relatively easy to find where the problem is.

...

     On the other hand, I download the sample bundle, and it works
well.
     Here, I also have a question. If we do like “-e
cs_etm/cycacc,timestamp/ ”, we should get the accurate amount of cycles of each instruction and the timestamp inserted into the trace messages.
     Now, from the sample bundle, how to get the occupation of one
symbol? Is it based on the amount of instructions?

You will have to make your own python scripts (like the ones we provided as examples) to take advantage of the cycle accurate and timestamp information. Perf has no knowledge of this information (as it should be).

...

     Recently , I find many patches are merged into the project
througe emails from coresight. However we cann’t get the new code from Github.
     Can I try the new code?

I realise this is a problem. On the flip side I can't keep updating the code on github every time I take patch in my next tree or release a feature on the mailing lists for review - the maintenance cost would be way too high. As such I have taken the habit of updating the code with every kernel release. From time to time you might see an odd release candidate version (-rcX) but that is the exception.

...

     And really beg your pardon if I do trouble you.

And thanks for your time!

Best regards, Mathieu

...

*发件人:* Mathieu Poirier [mailto:mathieu.poirier@linaro.org] *发送时间:* 2016年7月29日 23:27 *收件人:* liubowen (A) *抄送:* coresight@lists.linaro.org *主题:* Re: //Re: 【questions on coresight integrated with perf】

On 29 July 2016 at 04:18, liubowen (A) liubowen2@huawei.com wrote:

Hi Mathieu:
Glad to receive your reply.
I am sorry to trouble you, and I will present my question again, and Thanks for your time again.

As we know, perf can be used to record samples by interrupts made from PMU and show hot spots.

Not all PMUs generate interrupts, and that is exactly the case for CoreSight. The CoreSight PMU simply start trace collection when the process it is associated with is installed on a processor. The recording process stops when the process is yanked out. As such issues with spinlocks as you describe below aren't a problem. The ETMs will trace for as long as the process is executing, regardless of what that execution is.

One condition as follows:

spin_lock_irq();

A();

B();

spin_unlock_irq();

In order to avoid deadlock, wo replace spin_lock_irq with spin_lock. However, spin_lock_irq() will disable local interrupt by local_irq_disable().

So, the interrupts made from PMU can not be handle util spin_unlock_irq() executes. At this point, the value of the current Instruction Pointer direct “spin_unlock_irq”.

The time spent on A() and B() will be treated on spin_unlock_irq(), when we perf report, we can not see the occupation of A() or B(). Therefore the report is abnormal, and the normal report should contain A() and B().

Currently, I do not come up with a wonderful solution. And today, I read the paper “CoreSight, Perf and the OpenCSD Library” once again, I get more.

The trace data from ETM will be recorded in the perf.data. When we perf report or script, we can decode the trace data and get hot spots and so on.

I wonder whether the trace data is from the start to the end during recording. If the trace data is complete, the abnormal report will be solved perfectly. Indeed, I do not get insight into the solution offered by you guys. So, it is a little hard for me to check. And I hope you can understand what I say and give me some suggestion. Maybe it is a easy question, and beg your pardon.^_^

From your description it is not clear (at least to me) if you have collected trace data generated by the CoreSight PMU or not.

Okay, thanks very much for your time spenting on my question. And it is my honor to talk with you.

On the other hand, I find something wrong. Such as the web page https://github.com/Linaro/OpenCSD/blob/opencsd-0v002/ HOWTO.md#on-target-trace-collection

I suggest you use opencsd-0v003 - it has the latest code and updated documentation.

I git clone the whole project, but find there is no branch named perf-opencsd-4.7-rc1.

So, when I do as follows, I can not get the rc1 branch. Maybe it is the reason why I get stuck.

Simply use branch "perf-opencsd-4.7" - it has the same features as "perf-opencsd-4.7-rc1".

Also keep an eye out for the address range filtering feature, allowing one to limit tracing to a very narrow range. I will publish the code in the coming weeks, as soon as I know it has made it to the maintainers' tree.

At last, thanks thanks for your time!!!

Regards

Bob

*发件人:* Mathieu Poirier [mailto:mathieu.poirier@linaro.org] *发送时间:* 2016年7月28日 22:38 *收件人:* liubowen (A) *抄送:* coresight@lists.linaro.org; Zhanweitao *主题:* Re: 【questions on coresight integrated with perf】

On 28 July 2016 at 00:53, liubowen (A) liubowen2@huawei.com wrote:

Hi,

Thanks for your time!

I am bob. I am interested in the CoreSight Project. And I get much from the web page http://www.linaro.org/blog/core-dump/coresight-perf-and- the-opencsd-library/.

Because I work on ARM64, there is a bug with perf working on ARM. Specific information from https://www.linaro.org/blog/core-dump/debugging-arm- kernels-using-nmifiq/.

For instance, when we run : dd if=/dev/urandom of=/dev/null, over 90% of the CPU time is spent unlocking interrupts and the cryptographic operations that should dominate the

use case are completely hidden.

The author Daniel Thompson from Linaro comes up with a primary solution, however he suggests it will need further work.

Now, CoreSight can trace program flow only by hardware. If we combine coresight with perf, when we run “dd if=/dev/urandom of=/dev/null” and perf record, will the report be normal?

If it is normal, it will be amazing!!! And, I am eager for the related information.

What do you expect to see in a "normal" report?

There is no restriction on the code CoreSight can trace, and with the soon-to-be released address filtering capabilities, knowing exactly what the HW is doing will become a lot easier. The only requirement (for now) is that CPUidle be disabled.

I have followed the documentation to enable coresight and perf, but get stuck. I can not figure out whether it is normal.

That is unfortunately the downside to CoreSight. But as every powerful technology, complexity is inherent.

I greatly appreciate for your help!!! Thanks again for your time!!!

I am not sure of how I can help you here. Other than the one above (to which I have replied), I don't see any specific questions.

Regards,

Mathieu

CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight

3531

days inactive

3542

days old

coresight@lists.linaro.org

6 comments

participants

tags (0)

participants (3)

Chunyan Zhang
liubowen (A)
Mathieu Poirier