[PATCH 0/7] Split Coresight decode by aux records

List overview All Threads
Download

newer

older

office chair distribution

Disposable Vape pen bars

James Clark

12 Feb 2021 12 Feb '21

2:45 p.m.

Hi All,

Since my previous RFC, I've fixed --per-thread mode and solved most of the open questions. I've also changed --dump-raw-trace to use the same code path so it's also working now.

I think the only open questions are: * General approach * If aux records need to be saved, or if they can be pulled from elsewhere.

I've also tested perf inject which is now working with troublesome files.

Thanks James

James Clark (7): perf cs-etm: Split up etm queue setup function perf cs-etm: Only search timestamp in current sample's queue. perf cs-etm: Save aux records in each etm queue perf cs-etm: don't process queues until cs_etm__flush_events perf cs-etm: split decode by aux records. perf cs-etm: Use existing decode code path for --dump-raw-trace perf cs-etm: Suppress printing when resetting decoder

.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 +- tools/perf/util/cs-etm.c | 300 ++++++++++-------- 2 files changed, 168 insertions(+), 142 deletions(-)

-- 2.28.0

Show replies by date

James Clark

12 Feb 12 Feb

2:45 p.m.

New subject: [PATCH 1/7] perf cs-etm: Split up etm queue setup function

Refactor the function into separate allocation and timestamp search parts. Later the timestamp search will be done multiple times.

Signed-off-by: James Clark james.clark@arm.com --- tools/perf/util/cs-etm.c | 60 +++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index a2a369e2fbb6..27894facae5e 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -765,33 +765,12 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) return NULL; }

-static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, - struct auxtrace_queue *queue, - unsigned int queue_nr) +static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq) { int ret = 0; + u64 timestamp; unsigned int cs_queue_nr; u8 trace_chan_id; - u64 timestamp; - struct cs_etm_queue *etmq = queue->priv; - - if (list_empty(&queue->head) || etmq) - goto out; - - etmq = cs_etm__alloc_queue(etm); - - if (!etmq) { - ret = -ENOMEM; - goto out; - } - - queue->priv = etmq; - etmq->etm = etm; - etmq->queue_nr = queue_nr; - etmq->offset = 0; - - if (etm->timeless_decoding) - goto out;

/* * We are under a CPU-wide trace scenario. As such we need to know @@ -808,7 +787,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, */ ret = cs_etm__get_data_block(etmq); if (ret <= 0) - goto out; + return ret;

/* * Run decoder on the trace block. The decoder will stop when @@ -817,7 +796,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, */ ret = cs_etm__decode_data_block(etmq); if (ret) - goto out; + return ret;

/* * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all @@ -849,10 +828,33 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, * Note that packets decoded above are still in the traceID's packet * queue and will be processed in cs_etm__process_queues(). */ - cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_chan_id); - ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp); -out: - return ret; + cs_queue_nr = TO_CS_QUEUE_NR(etmq->queue_nr, trace_chan_id); + return auxtrace_heap__add(&etmq->etm->heap, cs_queue_nr, timestamp); +} + +static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, + struct auxtrace_queue *queue, + unsigned int queue_nr) +{ + struct cs_etm_queue *etmq = queue->priv; + + if (list_empty(&queue->head) || etmq) + return 0; + + etmq = cs_etm__alloc_queue(etm); + + if (!etmq) + return -ENOMEM; + + queue->priv = etmq; + etmq->etm = etm; + etmq->queue_nr = queue_nr; + etmq->offset = 0; + + if (etm->timeless_decoding) + return 0; + else + return cs_etm__search_first_timestamp(etmq); }

static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)

-- 2.28.0

Leo Yan

20 Feb 20 Feb

8:11 a.m.

New subject: [PATCH 1/7] perf cs-etm: Split up etm queue setup function

Hi James,

On Fri, Feb 12, 2021 at 04:45:07PM +0200, James Clark wrote:

...

Refactor the function into separate allocation and timestamp search parts. Later the timestamp search will be done multiple times.

The new introduced function cs_etm__search_first_timestamp() is to search timestamp; if it cannot find any valid timestamp, it will drop all packets for every queue with the logic:

cs_etm__search_first_timestamp() { timestamp = cs_etm__etmq_get_timestamp();

if (timestamp) return auxtrace_heap__add(); -> heapify timestamp

cs_etm__clear_all_packet_queues(); -> clear all packets return 0; }

If the function cs_etm__search_first_timestamp() is invoked for multiple times, is it possible to clear all packets in the middle of decoding?

...

From my understanding, it makes sense to drop the trace data at the

very early decoding so that can get the timestamp for packets, afterwards the packets should always have valid timestamp? If so, I don't think it's necessary to contain the function cs_etm__clear_all_packet_queues() in cs_etm__search_first_timestamp().

I will go back to check this conclusion is correct or not after I have better understanding for the whole patch set. Just note here.

Thanks, Leo

...

Signed-off-by: James Clark james.clark@arm.com

tools/perf/util/cs-etm.c | 60 +++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index a2a369e2fbb6..27894facae5e 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -765,33 +765,12 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) return NULL; } -static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
	       struct auxtrace_queue *queue,
	       unsigned int queue_nr)
+static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq) { int ret = 0;

u64 timestamp; unsigned int cs_queue_nr; u8 trace_chan_id;
u64 timestamp;

struct cs_etm_queue *etmq = queue->priv;

if (list_empty(&queue->head) || etmq)
goto out;
etmq = cs_etm__alloc_queue(etm);

if (!etmq) {
ret = -ENOMEM;
goto out;
}

queue->priv = etmq;

etmq->etm = etm;

etmq->queue_nr = queue_nr;

etmq->offset = 0;

if (etm->timeless_decoding)
goto out;
/* * We are under a CPU-wide trace scenario. As such we need to know @@ -808,7 +787,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, */ ret = cs_etm__get_data_block(etmq); if (ret <= 0)
	goto out;
	return ret;
/* * Run decoder on the trace block. The decoder will stop when @@ -817,7 +796,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, */ ret = cs_etm__decode_data_block(etmq); if (ret)
	goto out;
	return ret;
/* * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all @@ -849,10 +828,33 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, * Note that packets decoded above are still in the traceID's packet * queue and will be processed in cs_etm__process_queues(). */

cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_chan_id);

ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp);

-out:

return ret;

cs_queue_nr = TO_CS_QUEUE_NR(etmq->queue_nr, trace_chan_id);

return auxtrace_heap__add(&etmq->etm->heap, cs_queue_nr, timestamp);

+}

+static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
	       struct auxtrace_queue *queue,
	       unsigned int queue_nr)
+{
struct cs_etm_queue *etmq = queue->priv;

if (list_empty(&queue->head) || etmq)
return 0;
etmq = cs_etm__alloc_queue(etm);

if (!etmq)
return -ENOMEM;
queue->priv = etmq;

etmq->etm = etm;

etmq->queue_nr = queue_nr;

etmq->offset = 0;

if (etm->timeless_decoding)
return 0;
else
return cs_etm__search_first_timestamp(etmq);
} static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) -- 2.28.0

James Clark

12 Feb 12 Feb

2:45 p.m.

New subject: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.

Change initial timestamp search to only operate on the queue related to the current event. In a later change the bounds of the aux record will also be used to reset the decoder and the record is only relevant to a single queue.

This change makes some files that had coresight data but didn't syntesise any events start working and generating events. I'm not sure of the reason for that. I'd expect this change to only affect the ordering of events.

Signed-off-by: James Clark james.clark@arm.com --- tools/perf/util/cs-etm.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 27894facae5e..8f8b448632fb 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -97,7 +97,7 @@ struct cs_etm_queue { /* RB tree for quick conversion between traceID and metadata pointers */ static struct intlist *traceid_list;

-static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu); static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm, static int cs_etm__flush_events(struct perf_session *session, struct perf_tool *tool) { - int ret; struct cs_etm_auxtrace *etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace); @@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session, if (!tool->ordered_events) return -EINVAL;

- ret = cs_etm__update_queues(etm); - - if (ret < 0) - return ret; - if (etm->timeless_decoding) return cs_etm__process_timeless_queues(etm, -1);

@@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->queue_nr = queue_nr; etmq->offset = 0;

- if (etm->timeless_decoding) - return 0; - else - return cs_etm__search_first_timestamp(etmq); + return 0; }

static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; }

-static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) { + int ret; if (etm->queues.new_data) { etm->queues.new_data = false; - return cs_etm__setup_queues(etm); + ret = cs_etm__setup_queues(etm); + if (ret) + return ret; }

- return 0; + if (!etm->timeless_decoding) + return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv); + else + return 0; }

static inline @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session, else timestamp = 0;

- if (timestamp || etm->timeless_decoding) { - err = cs_etm__update_queues(etm); + if ((timestamp || etm->timeless_decoding) + && event->header.type == PERF_RECORD_AUX) { + err = cs_etm__update_queues(etm, sample->cpu); if (err) return err; }

-- 2.28.0

Leo Yan

20 Feb 20 Feb

11:50 a.m.

New subject: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.

On Fri, Feb 12, 2021 at 04:45:08PM +0200, James Clark wrote:

...

Change initial timestamp search to only operate on the queue related to the current event. In a later change the bounds of the aux record will also be used to reset the decoder and the record is only relevant to a single queue.

I roughly understand this patch tries to establish the mechanism for timstamp search per CPU, but I am struggling to understand what's issue you try to address.

So could you give more description for "what's current issue?" and "why need to use this way to fix the issue?". This would be very appreciated.

...

This change makes some files that had coresight data but didn't syntesise any events start working and generating events. I'm not sure of the reason for that. I'd expect this change to only affect the ordering of events.

This seems to me that this patch introduces regression.

...

Signed-off-by: James Clark james.clark@arm.com

tools/perf/util/cs-etm.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 27894facae5e..8f8b448632fb 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -97,7 +97,7 @@ struct cs_etm_queue { /* RB tree for quick conversion between traceID and metadata pointers */ static struct intlist *traceid_list; -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu); static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm, static int cs_etm__flush_events(struct perf_session *session, struct perf_tool *tool) {

int ret; struct cs_etm_auxtrace *etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace);

@@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session, if (!tool->ordered_events) return -EINVAL;
ret = cs_etm__update_queues(etm);

if (ret < 0)
return ret;

When flush events, it means the trace data is discontinuous or at the end of trace data. If the trace data is discontinuous, we need to use cs_etm__update_queues() to create new queues. So if we remove the calling cs_etm__update_queues(), I suspect it cannot handle the discontinuous trace data anymore.

...

if (etm->timeless_decoding) return cs_etm__process_timeless_queues(etm, -1); @@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->queue_nr = queue_nr; etmq->offset = 0;
if (etm->timeless_decoding)
return 0;
else
return cs_etm__search_first_timestamp(etmq);
return 0;

} static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; } -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) {

int ret; if (etm->queues.new_data) { etm->queues.new_data = false;
return cs_etm__setup_queues(etm);
ret = cs_etm__setup_queues(etm);

Just remind, the new parameter "cpu" is introduced in this function, in theory, cs_etm__update_queues() should work for the specified CPU. But it always setup queues for all CPUs with calling cs_etm__setup_queues().

...

```
if (ret)
```
```
	return ret;
```
}

return 0;

if (!etm->timeless_decoding)

return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);

else
```
return 0;
```

In the original code of cs_etm__update_queues(), if there have no any new data (or has already setup queues), then it does nothing and directly bails out.

After applied the up change, it will always search the first timestamp for the "cpu".

...

} static inline @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session, else timestamp = 0;
if (timestamp || etm->timeless_decoding) {
err = cs_etm__update_queues(etm);
if ((timestamp || etm->timeless_decoding)
	&& event->header.type == PERF_RECORD_AUX) {
err = cs_etm__update_queues(etm, sample->cpu);

Here might cause potential issue. Let's see an example:

If CoreSight uses N:1 model for tracers and sink (just like multiple ETMs output trace to the same ETR), when "sample->cpu" is 0, it will only initialize the timestamp for CPU 0. In the same AUX trace data, it can contains trace for CPU 1, 2, etc; for this case, the flow doesn't prepare the timestamp for CPU1/2, so it will fail to generate samples for CPU 1/2, right?

Thanks, Leo

...

if (err)
	return err;

}

2.28.0

James Clark

1 Mar 1 Mar

3:28 p.m.

New subject: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.

On 20/02/2021 13:50, Leo Yan wrote:

...

On Fri, Feb 12, 2021 at 04:45:08PM +0200, James Clark wrote:

...
Change initial timestamp search to only operate on the queue related to the current event. In a later change the bounds of the aux record will also be used to reset the decoder and the record is only relevant to a single queue.

I roughly understand this patch tries to establish the mechanism for timstamp search per CPU, but I am struggling to understand what's issue you try to address.

So could you give more description for "what's current issue?" and "why need to use this way to fix the issue?". This would be very appreciated.

Hi Leo,

The issue is that the aux records used to reset the decoder are associated with a specific CPU/aux queue. Currently when any new data is received, all queues are searched for a timestamp. We can't do it that way any more because the aux records aren't available yet.

The reason to fix it this way is because now we can only do decode when an aux record is received. This will happen multiple times, and will also be cpu/queue specific.

...

...
This change makes some files that had coresight data but didn't syntesise any events start working and generating events. I'm not sure of the reason for that. I'd expect this change to only affect the ordering of events.

This seems to me that this patch introduces regression.

I'm wondering if it is a regression, or accidentally fixing a bug. It doesn't seem like it's possible to go from not generating any samples to generating lots without accidentally fixing an existing issue. If there is valid data there, what would be stopping it from generating any samples?

I do need to look into this more closely though to find the real reason for it, which will probably shed more light on it.

...

...
Signed-off-by: James Clark james.clark@arm.com

tools/perf/util/cs-etm.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 27894facae5e..8f8b448632fb 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -97,7 +97,7 @@ struct cs_etm_queue { /* RB tree for quick conversion between traceID and metadata pointers */ static struct intlist *traceid_list; -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu); static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm, static int cs_etm__flush_events(struct perf_session *session, struct perf_tool *tool) {

int ret; struct cs_etm_auxtrace *etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace);

@@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session, if (!tool->ordered_events) return -EINVAL;
ret = cs_etm__update_queues(etm);

if (ret < 0)
return ret;
When flush events, it means the trace data is discontinuous or at the end of trace data. If the trace data is discontinuous, we need to use cs_etm__update_queues() to create new queues. So if we remove the calling cs_etm__update_queues(), I suspect it cannot handle the discontinuous trace data anymore.

Do you know how to force perf to record data like this? From my experience etm->queues.new_data is only set once when the file is first opened.

...

...
if (etm->timeless_decoding) return cs_etm__process_timeless_queues(etm, -1); @@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->queue_nr = queue_nr; etmq->offset = 0;
if (etm->timeless_decoding)
return 0;
else
return cs_etm__search_first_timestamp(etmq);
return 0;

} static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; } -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) {

int ret; if (etm->queues.new_data) { etm->queues.new_data = false;
return cs_etm__setup_queues(etm);
ret = cs_etm__setup_queues(etm);
Just remind, the new parameter "cpu" is introduced in this function, in theory, cs_etm__update_queues() should work for the specified CPU. But it always setup queues for all CPUs with calling cs_etm__setup_queues().

...
if (ret)
	return ret;
}
return 0;
if (!etm->timeless_decoding)
return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
else
return 0;
In the original code of cs_etm__update_queues(), if there have no any new data (or has already setup queues), then it does nothing and directly bails out.

After applied the up change, it will always search the first timestamp for the "cpu".

Maybe I need to pull my later change into here that makes it skip the search after the timestamp is found for the first time. What do you think?

...

...
} static inline @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session, else timestamp = 0;
if (timestamp || etm->timeless_decoding) {
err = cs_etm__update_queues(etm);
if ((timestamp || etm->timeless_decoding)
	&& event->header.type == PERF_RECORD_AUX) {
err = cs_etm__update_queues(etm, sample->cpu);
Here might cause potential issue. Let's see an example:

If CoreSight uses N:1 model for tracers and sink (just like multiple ETMs output trace to the same ETR), when "sample->cpu" is 0, it will only initialize the timestamp for CPU 0. In the same AUX trace data, it can contains trace for CPU 1, 2, etc; for this case, the flow doesn't prepare the timestamp for CPU1/2, so it will fail to generate samples for CPU 1/2, right?

If there is only one sink then wouldn't there also only be only one aux queue and aux records generated for one CPU? I thought each channel in the data corresponded to a cs_etm_traceid_queue and that is how multiple tracers worked? But the channel corresponds to a thread rather than a CPU because the struct looks like this:

struct cs_etm_traceid_queue { u8 trace_chan_id; pid_t pid, tid; ...

There is currently an issue with per-thread mode where AUX events don't have the CPU ID set and I've assumed CPU 0 which is incorrect. Maybe that is related to your potential issue as well.

Thanks, James

...

Thanks, Leo

...
if (err)
	return err;
}

2.28.0

Leo Yan

2 Mar 2 Mar

11:52 a.m.

New subject: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.

On Mon, Mar 01, 2021 at 05:28:57PM +0200, James Clark wrote:

...

On 20/02/2021 13:50, Leo Yan wrote:

...
On Fri, Feb 12, 2021 at 04:45:08PM +0200, James Clark wrote:

...
Change initial timestamp search to only operate on the queue related to the current event. In a later change the bounds of the aux record will also be used to reset the decoder and the record is only relevant to a single queue.

I roughly understand this patch tries to establish the mechanism for timstamp search per CPU, but I am struggling to understand what's issue you try to address.

So could you give more description for "what's current issue?" and "why need to use this way to fix the issue?". This would be very appreciated.

Hi Leo,

The issue is that the aux records used to reset the decoder are associated with a specific CPU/aux queue. Currently when any new data is received, all queues are searched for a timestamp. We can't do it that way any more because the aux records aren't available yet.

The reason to fix it this way is because now we can only do decode when an aux record is received. This will happen multiple times, and will also be cpu/queue specific.

Okay, I think you are try to establish the logic as below:

- Step 1: Wait for PERF_RECORD_AUX event; - Step 2: Receive PERF_RECORD_AUX event, knows the aux buffer offset and size; - Step 3: Update the auxtrace queue based on the info from PERF_RECORD_AUX event; - Step 4: Find the first trace for timestamp, drop any trace data prior to the timestamp; - Step 5: Conintue to decode the trace data which is contained in the current PERF_RECORD_AUX event; - Step 6: Finish the decoding and goto step1.

...

...
...
This change makes some files that had coresight data but didn't syntesise any events start working and generating events. I'm not sure of the reason for that. I'd expect this change to only affect the ordering of events.

This seems to me that this patch introduces regression.

I'm wondering if it is a regression, or accidentally fixing a bug. It doesn't seem like it's possible to go from not generating any samples to generating lots without accidentally fixing an existing issue. If there is valid data there, what would be stopping it from generating any samples?

I think we need to clarify what's the current code logic:

- Step 1: at beginning, only prepare for auxtrace queues, for N:1 model (all ETMs use the same sink), there have only one queue; for 1:1 model (one ETM has its own sink), there have multiple queues.

The function cs_etm__update_queues() is for this step, it doesn't generate any sample and only prepare for the first timestamp so later the sampels can be compared with each other for the trace data coming from multiple CPUs.

- Step 2: it starts to decode the trace data and synthesize samples, this is finished by function cs_etm__process_queues().

Seems to me, now we need to fix the issue in step2, in step2, it decodes the whole AUX buffer, so the main target is to limit the decoding length, which should base on the info provided by PERF_RECORD_AUX event.

IIUC, now you are trying to fix issue in step 1, but step 1 is not the root cause for the issue.

...

I do need to look into this more closely though to find the real reason for it, which will probably shed more light on it.

...
...
Signed-off-by: James Clark james.clark@arm.com

tools/perf/util/cs-etm.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 27894facae5e..8f8b448632fb 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -97,7 +97,7 @@ struct cs_etm_queue { /* RB tree for quick conversion between traceID and metadata pointers */ static struct intlist *traceid_list; -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu); static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm, static int cs_etm__flush_events(struct perf_session *session, struct perf_tool *tool) {

int ret; struct cs_etm_auxtrace *etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace);

@@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session, if (!tool->ordered_events) return -EINVAL;
ret = cs_etm__update_queues(etm);

if (ret < 0)
return ret;
When flush events, it means the trace data is discontinuous or at the end of trace data. If the trace data is discontinuous, we need to use cs_etm__update_queues() to create new queues. So if we remove the calling cs_etm__update_queues(), I suspect it cannot handle the discontinuous trace data anymore.
Do you know how to force perf to record data like this? From my experience etm->queues.new_data is only set once when the file is first opened.

Maybe can try the snapshot mode, which you can capture multiple times for the snapshot trace data based on the signals.

Please check the test case [1] for how to use snapshot mode.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...

...

...
...
if (etm->timeless_decoding) return cs_etm__process_timeless_queues(etm, -1); @@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->queue_nr = queue_nr; etmq->offset = 0;
if (etm->timeless_decoding)
return 0;
else
return cs_etm__search_first_timestamp(etmq);
return 0;

} static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; } -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) {

int ret; if (etm->queues.new_data) { etm->queues.new_data = false;
return cs_etm__setup_queues(etm);
ret = cs_etm__setup_queues(etm);
Just remind, the new parameter "cpu" is introduced in this function, in theory, cs_etm__update_queues() should work for the specified CPU. But it always setup queues for all CPUs with calling cs_etm__setup_queues().

...
if (ret)
	return ret;
}
return 0;
if (!etm->timeless_decoding)
return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
else
return 0;
In the original code of cs_etm__update_queues(), if there have no any new data (or has already setup queues), then it does nothing and directly bails out.

After applied the up change, it will always search the first timestamp for the "cpu".
Maybe I need to pull my later change into here that makes it skip the search after the timestamp is found for the first time. What do you think?

The patches ordering is important for understanding the changes, but let's firstly sync for the brief direction for the fixing.

...

...
...
} static inline @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session, else timestamp = 0;
if (timestamp || etm->timeless_decoding) {
err = cs_etm__update_queues(etm);
if ((timestamp || etm->timeless_decoding)
	&& event->header.type == PERF_RECORD_AUX) {
err = cs_etm__update_queues(etm, sample->cpu);
Here might cause potential issue. Let's see an example:

If CoreSight uses N:1 model for tracers and sink (just like multiple ETMs output trace to the same ETR), when "sample->cpu" is 0, it will only initialize the timestamp for CPU 0. In the same AUX trace data, it can contains trace for CPU 1, 2, etc; for this case, the flow doesn't prepare the timestamp for CPU1/2, so it will fail to generate samples for CPU 1/2, right?
If there is only one sink then wouldn't there also only be only one aux queue and aux records generated for one CPU? I thought each channel in the data corresponded to a cs_etm_traceid_queue and that is how multiple tracers worked?

If has only one sink, multiple CPUs trace data will finally be recorded into the single AUX buffer with only one queue.

...

But the channel corresponds to a thread rather than a CPU because the struct looks like this:
struct cs_etm_traceid_queue {
   u8 trace_chan_id;
   pid_t pid, tid;
        ...

Please note, IIUC, struct cs_etm_traceid_queue is used to track specific info for every CPU, when the decoder decodes the trace data, it can generate the "trace_chan_id" rather than the CPU ID, it needs to find the corresponding aux trace queue based on "trace_chan_id", so this is why it records "trace_chan_id" in the structure cs_etm_traceid_queue.

@Mathiue, is best knowing for this, please corrent me if I make any mistake at here.

...

There is currently an issue with per-thread mode where AUX events don't have the CPU ID set and I've assumed CPU 0 which is incorrect. Maybe that is related to your potential issue as well.

please refer the function cs_etm__get_cpu(), it converts "trace_chan_id" to the CPU number.

Thanks, Leo

James Clark

6 May 6 May

10:45 a.m.

New subject: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.

On 12/02/2021 16:45, James Clark wrote:

...

Change initial timestamp search to only operate on the queue related to the current event. In a later change the bounds of the aux record will also be used to reset the decoder and the record is only relevant to a single queue.

This change makes some files that had coresight data but didn't syntesise any events start working and generating events. I'm not sure of the reason for that. I'd expect this change to only affect the ordering of events.

I've gotten to the bottom of this mystery of why decoding starts working because of this change. Currently:

* _All_ decoding happens on the first AUX record * Decoding depends on binary data (so also depends on MMAP records) * Ordering of AUX records and MMAP records is timing sensitive

So there are two scenarios: 1) The perf.data file contains MMAPs followed by AUX records. Everything works 2) The perf.data file contains an AUX record, followed by MMAPS, then further AUX records. Decoding never worked.

Per-thread mode (timeless) always worked because we wait for EXIT rather than AUX to start the decode, which is after MMAPS. Per-cpu mode was always at the mercy of the ordering of events. So it's not a regression that this patchset changes the behaviour here and it's doing more of 'the right thing' now.

As a separate change I will add a warning to cs_etm__mem_access() when it fails to find the right binary as this is a current sore point.

James

...

Signed-off-by: James Clark james.clark@arm.com

tools/perf/util/cs-etm.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 27894facae5e..8f8b448632fb 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -97,7 +97,7 @@ struct cs_etm_queue { /* RB tree for quick conversion between traceID and metadata pointers */ static struct intlist *traceid_list; -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm); +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu); static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm, static int cs_etm__flush_events(struct perf_session *session, struct perf_tool *tool) {

int ret; struct cs_etm_auxtrace *etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace);

@@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session, if (!tool->ordered_events) return -EINVAL;
ret = cs_etm__update_queues(etm);

if (ret < 0)
return ret;
if (etm->timeless_decoding) return cs_etm__process_timeless_queues(etm, -1);
@@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, etmq->queue_nr = queue_nr; etmq->offset = 0;
if (etm->timeless_decoding)
return 0;
else
return cs_etm__search_first_timestamp(etmq);
return 0;

} static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; } -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm) +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) {

int ret; if (etm->queues.new_data) { etm->queues.new_data = false;
return cs_etm__setup_queues(etm);
ret = cs_etm__setup_queues(etm);
if (ret)
	return ret;
}
return 0;
if (!etm->timeless_decoding)
return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
else
return 0;
} static inline @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session, else timestamp = 0;
if (timestamp || etm->timeless_decoding) {
err = cs_etm__update_queues(etm);
if ((timestamp || etm->timeless_decoding)
	&& event->header.type == PERF_RECORD_AUX) {
err = cs_etm__update_queues(etm, sample->cpu);
if (err) return err; }

James Clark

12 Feb 12 Feb

2:45 p.m.

New subject: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue

The aux records will be used set the bounds of decoding in a later commit. In the future we may also want to use the flags of each record to control decoding.

Do these need to be saved in their entirety, or can pointers to each record safely be saved instead for later access?

Signed-off-by: James Clark james.clark@arm.com --- tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8f8b448632fb..88b541b2a804 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -92,12 +92,16 @@ struct cs_etm_queue { /* Conversion between traceID and index in traceid_queues array */ struct intlist *traceid_queues_list; struct cs_etm_traceid_queue **traceid_queues; + int aux_record_list_len; + int aux_record_list_idx; + struct perf_record_aux *aux_record_list; };

/* RB tree for quick conversion between traceID and metadata pointers */ static struct intlist *traceid_list;

-static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu); +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, + struct perf_record_aux *aux_record); static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); @@ -585,6 +589,7 @@ static void cs_etm__free_queue(void *priv)

cs_etm_decoder__free(etmq->decoder); cs_etm__free_traceid_queues(etmq); + free(etmq->aux_record_list); free(etmq); }

@@ -759,6 +764,19 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) return NULL; }

+static int cs_etm__save_aux_record(struct cs_etm_queue *etmq, + struct perf_record_aux *aux_record) +{ + etmq->aux_record_list = reallocarray(etmq->aux_record_list, + etmq->aux_record_list_len+1, + sizeof(*etmq->aux_record_list)); + if (!etmq->aux_record_list) + return -ENOMEM; + + etmq->aux_record_list[etmq->aux_record_list_len++] = *aux_record; + return 0; +} + static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq) { int ret = 0; @@ -865,7 +883,7 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; }

-static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct perf_record_aux *aux) { int ret; if (etm->queues.new_data) { @@ -875,6 +893,14 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) return ret; }

+ /* In timeless mode, cpu is set to -1, and a single aux buffer is filled */ + if (cpu < 0) + cpu = 0; + + ret = cs_etm__save_aux_record(etm->queues.queue_array[cpu].priv, aux); + if (ret) + return ret; + if (!etm->timeless_decoding) return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv); else @@ -2357,7 +2383,7 @@ static int cs_etm__process_event(struct perf_session *session,

if ((timestamp || etm->timeless_decoding) && event->header.type == PERF_RECORD_AUX) { - err = cs_etm__update_queues(etm, sample->cpu); + err = cs_etm__update_queues(etm, sample->cpu, &event->aux); if (err) return err; }

-- 2.28.0

Leo Yan

27 Feb 27 Feb

7:10 a.m.

New subject: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue

On Fri, Feb 12, 2021 at 04:45:09PM +0200, James Clark wrote:

...

The aux records will be used set the bounds of decoding in a later commit. In the future we may also want to use the flags of each record to control decoding.

Do these need to be saved in their entirety, or can pointers to each record safely be saved instead for later access?

Rather than introudcing the perf record list, I just wander if we can use easier method to fix this problem. So below is the rough idea (though I don't really verify it):

The essential information we need is what's the valid buffer length can be used for decoding. Though cs_etm_queue::buf_len tracks the buffer length, but it's the buffer length is for the whole AUX buffer, and which belongs to multiple "PERF_RECORD_AUX" events. So we cannot decode at once for the whole trace data in the AUX trace buffer, on the other hand, the incoming "PERF_RECORD_AUX" event can guide the CoreSight decoder it should decode how much buffer size. At the end, the trace data can be decoded step by step based on the incoming "PERF_RECORD_AUX" events.

I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it stands for the record length based on the RECORD_AUX event. In theory, this value should be always less than "cs_etm_queue::buf_len".

When every time the "PERF_RECORD_AUX" event is coming, we find out the corresponding queue (so this can be applied for "1:1" or "N:1" models for source and sink), and accumulate "perf_record_aux::aux_size" into "cs_etm_queue::buf_rec_len".

At the decoder side, it decreases "etmq->buf_rec_len" until to zero for the current round of decoding (see cs_etm__decode_data_block()). Since all the "PERF_RECORD_AUX" event will be processed before "PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be ignored.

The main reason for this suggestion is it don't need to change the significant logic in current code. I will try to do experiment for this idea and share back.

James, if you think I miss anything, please correct me as needed. Thanks!

Leo

...

Signed-off-by: James Clark james.clark@arm.com

tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8f8b448632fb..88b541b2a804 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -92,12 +92,16 @@ struct cs_etm_queue { /* Conversion between traceID and index in traceid_queues array */ struct intlist *traceid_queues_list; struct cs_etm_traceid_queue **traceid_queues;

int aux_record_list_len;

int aux_record_list_idx;

struct perf_record_aux *aux_record_list;

}; /* RB tree for quick conversion between traceID and metadata pointers */ static struct intlist *traceid_list; -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu); +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu,
		 struct perf_record_aux *aux_record);
static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); @@ -585,6 +589,7 @@ static void cs_etm__free_queue(void *priv) cs_etm_decoder__free(etmq->decoder); cs_etm__free_traceid_queues(etmq);

free(etmq->aux_record_list); free(etmq);

} @@ -759,6 +764,19 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) return NULL; } +static int cs_etm__save_aux_record(struct cs_etm_queue *etmq,
		   struct perf_record_aux *aux_record)
+{
etmq->aux_record_list = reallocarray(etmq->aux_record_list,
			      etmq->aux_record_list_len+1,
			      sizeof(*etmq->aux_record_list));
if (!etmq->aux_record_list)
return -ENOMEM;
etmq->aux_record_list[etmq->aux_record_list_len++] = *aux_record;

return 0;
+}

static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq) { int ret = 0; @@ -865,7 +883,7 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; } -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct perf_record_aux *aux) { int ret; if (etm->queues.new_data) { @@ -875,6 +893,14 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) return ret; }
/* In timeless mode, cpu is set to -1, and a single aux buffer is filled */

if (cpu < 0)
cpu = 0;
ret = cs_etm__save_aux_record(etm->queues.queue_array[cpu].priv, aux);

if (ret)
return ret;
if (!etm->timeless_decoding) return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv); else
@@ -2357,7 +2383,7 @@ static int cs_etm__process_event(struct perf_session *session, if ((timestamp || etm->timeless_decoding) && event->header.type == PERF_RECORD_AUX) {
err = cs_etm__update_queues(etm, sample->cpu);
err = cs_etm__update_queues(etm, sample->cpu, &event->aux);
if (err) return err; }
-- 2.28.0

James Clark

1 Mar 1 Mar

3:43 p.m.

New subject: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue

On 27/02/2021 09:10, Leo Yan wrote:

...

On Fri, Feb 12, 2021 at 04:45:09PM +0200, James Clark wrote:

...
The aux records will be used set the bounds of decoding in a later commit. In the future we may also want to use the flags of each record to control decoding.

Do these need to be saved in their entirety, or can pointers to each record safely be saved instead for later access?

Rather than introudcing the perf record list, I just wander if we can use easier method to fix this problem. So below is the rough idea (though I don't really verify it):

The essential information we need is what's the valid buffer length can be used for decoding. Though cs_etm_queue::buf_len tracks the buffer length, but it's the buffer length is for the whole AUX buffer, and which belongs to multiple "PERF_RECORD_AUX" events. So we cannot decode at once for the whole trace data in the AUX trace buffer, on the other hand, the incoming "PERF_RECORD_AUX" event can guide the CoreSight decoder it should decode how much buffer size. At the end, the trace data can be decoded step by step based on the incoming "PERF_RECORD_AUX" events.

I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it stands for the record length based on the RECORD_AUX event. In theory, this value should be always less than "cs_etm_queue::buf_len".

When every time the "PERF_RECORD_AUX" event is coming, we find out the corresponding queue (so this can be applied for "1:1" or "N:1" models for source and sink), and accumulate "perf_record_aux::aux_size" into "cs_etm_queue::buf_rec_len".

At the decoder side, it decreases "etmq->buf_rec_len" until to zero for the current round of decoding (see cs_etm__decode_data_block()). Since all the "PERF_RECORD_AUX" event will be processed before "PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be ignored.

The main reason for this suggestion is it don't need to change the significant logic in current code. I will try to do experiment for this idea and share back.

James, if you think I miss anything, please correct me as needed. Thanks!

This is an interesting idea, I think we could push decoded packets into the min heap as the aux records are received, and not do anything with them until the end of the data is reached. That way instead of saving aux records, we'd save the result of the decode for each aux record.

Currently each cs_etm_queue has a cs_etm_traceid_queue/cs_etm_packet_queue for each stream, but that would have to be changed to have multiple ones because multiple packets could be decoded to get through the whole aux record.

It would be a similarly sized change, and could also have a bigger impact on memory. So I'm not sure if it would help to reduce the changes, but it is possible.

James

...

Leo

...
Signed-off-by: James Clark james.clark@arm.com

tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8f8b448632fb..88b541b2a804 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -92,12 +92,16 @@ struct cs_etm_queue { /* Conversion between traceID and index in traceid_queues array */ struct intlist *traceid_queues_list; struct cs_etm_traceid_queue **traceid_queues;

int aux_record_list_len;

int aux_record_list_idx;

struct perf_record_aux *aux_record_list;

}; /* RB tree for quick conversion between traceID and metadata pointers */ static struct intlist *traceid_list; -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu); +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu,
		 struct perf_record_aux *aux_record);
static int cs_etm__process_queues(struct cs_etm_auxtrace *etm); static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, pid_t tid); @@ -585,6 +589,7 @@ static void cs_etm__free_queue(void *priv) cs_etm_decoder__free(etmq->decoder); cs_etm__free_traceid_queues(etmq);

free(etmq->aux_record_list); free(etmq);

} @@ -759,6 +764,19 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) return NULL; } +static int cs_etm__save_aux_record(struct cs_etm_queue *etmq,
		   struct perf_record_aux *aux_record)
+{
etmq->aux_record_list = reallocarray(etmq->aux_record_list,
			      etmq->aux_record_list_len+1,
			      sizeof(*etmq->aux_record_list));
if (!etmq->aux_record_list)
return -ENOMEM;
etmq->aux_record_list[etmq->aux_record_list_len++] = *aux_record;

return 0;
+}

static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq) { int ret = 0; @@ -865,7 +883,7 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm) return 0; } -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct perf_record_aux *aux) { int ret; if (etm->queues.new_data) { @@ -875,6 +893,14 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu) return ret; }
/* In timeless mode, cpu is set to -1, and a single aux buffer is filled */

if (cpu < 0)
cpu = 0;
ret = cs_etm__save_aux_record(etm->queues.queue_array[cpu].priv, aux);

if (ret)
return ret;
if (!etm->timeless_decoding) return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv); else
@@ -2357,7 +2383,7 @@ static int cs_etm__process_event(struct perf_session *session, if ((timestamp || etm->timeless_decoding) && event->header.type == PERF_RECORD_AUX) {
err = cs_etm__update_queues(etm, sample->cpu);
err = cs_etm__update_queues(etm, sample->cpu, &event->aux);
if (err) return err; }
-- 2.28.0

Leo Yan

2 Mar 2 Mar

12:03 p.m.

New subject: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue

On Mon, Mar 01, 2021 at 05:43:43PM +0200, James Clark wrote:

[...]

...

...
I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it stands for the record length based on the RECORD_AUX event. In theory, this value should be always less than "cs_etm_queue::buf_len".

When every time the "PERF_RECORD_AUX" event is coming, we find out the corresponding queue (so this can be applied for "1:1" or "N:1" models for source and sink), and accumulate "perf_record_aux::aux_size" into "cs_etm_queue::buf_rec_len".

At the decoder side, it decreases "etmq->buf_rec_len" until to zero for the current round of decoding (see cs_etm__decode_data_block()). Since all the "PERF_RECORD_AUX" event will be processed before "PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be ignored.

The main reason for this suggestion is it don't need to change the significant logic in current code. I will try to do experiment for this idea and share back.

James, if you think I miss anything, please correct me as needed. Thanks!

This is an interesting idea, I think we could push decoded packets into the min heap as the aux records are received, and not do anything with them until the end of the data is reached. That way instead of saving aux records, we'd save the result of the decode for each aux record.

Currently each cs_etm_queue has a cs_etm_traceid_queue/cs_etm_packet_queue for each stream, but that would have to be changed to have multiple ones because multiple packets could be decoded to get through the whole aux record.

It would be a similarly sized change, and could also have a bigger impact on memory. So I'm not sure if it would help to reduce the changes, but it is possible.

Below change is still very coarse and I just did very basic testing for it, so didn't cover all cases; so simply use it to demonstrate the basic idea.

Before the event PERF_RECORD_AUX arrives, we don't decode any trace data. And after PERF_RECORD_AUX coming, the aux buffer size will be accumulated into the queue, and decode the trace data for the queue based on the accumulated buffer length.

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index b9c1d329a7f1..3bd5609b6de4 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -89,7 +89,7 @@ struct cs_etm_queue { u8 pending_timestamp; u64 offset; const unsigned char *buf; - size_t buf_len, buf_used; + size_t aux_buf_len, buf_len, buf_used; /* Conversion between traceID and index in traceid_queues array */ struct intlist *traceid_queues_list; struct cs_etm_traceid_queue **traceid_queues; @@ -1085,6 +1085,7 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) if (old_buffer) auxtrace_buffer__drop_data(old_buffer); etmq->buf_len = 0; + etmq->aux_buf_len = 0; return 0; }

@@ -2052,6 +2053,7 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) etmq->offset += processed; etmq->buf_used += processed; etmq->buf_len -= processed; + etmq->aux_buf_len -= processed;

out: return ret; @@ -2177,7 +2179,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq) */ err = cs_etm__process_traceid_queue(etmq, tidq);

- } while (etmq->buf_len); + } while (etmq->aux_buf_len > 0);

if (err == 0) /* Flush any remaining branch stack entries */ @@ -2216,6 +2218,27 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, return 0; }

+static void cs_etm__update_aux_buf_len(struct cs_etm_auxtrace *etm, + struct perf_record_aux *aux) +{ + unsigned int cs_queue_nr, queue_nr; + struct auxtrace_queue *queue; + struct cs_etm_queue *etmq; + + if (!etm->heap.heap_cnt) + return; + + /* Take the entry at the top of the min heap */ + cs_queue_nr = etm->heap.heap_array[0].queue_nr; + queue_nr = TO_QUEUE_NR(cs_queue_nr); + queue = &etm->queues.queue_array[queue_nr]; + etmq = queue->priv; + + etmq->aux_buf_len += aux->aux_size; + fprintf(stderr, "%s: aux_buf_len=%ld\n", __func__, etmq->aux_buf_len); + return; +} + static int cs_etm__process_queues(struct cs_etm_auxtrace *etm) { int ret = 0; @@ -2272,6 +2295,9 @@ static int cs_etm__process_queues(struct cs_etm_auxtrace *etm) if (ret < 0) goto out;

+ if (etmq->aux_buf_len <= 0) + goto out; + /* * No more auxtrace_buffers to process in this etmq, simply * move on to another entry in the auxtrace_heap. @@ -2414,9 +2440,15 @@ static int cs_etm__process_event(struct perf_session *session, else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event);

+ fprintf(stderr, "%s: event->header.type=%d\n", __func__, event->header.type); + if (!etm->timeless_decoding && - event->header.type == PERF_RECORD_AUX) + event->header.type == PERF_RECORD_AUX) { + + fprintf(stderr, "%s: aux_size=%lld\n", __func__, event->aux.aux_size); + cs_etm__update_aux_buf_len(etm, &event->aux); return cs_etm__process_queues(etm); + }

return 0; }

James Clark

10 Mar 10 Mar

3:41 p.m.

New subject: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue

On 02/03/2021 14:03, Leo Yan wrote:

...

On Mon, Mar 01, 2021 at 05:43:43PM +0200, James Clark wrote:

[...]

...
...
I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it stands for the record length based on the RECORD_AUX event. In theory, this value should be always less than "cs_etm_queue::buf_len".

When every time the "PERF_RECORD_AUX" event is coming, we find out the corresponding queue (so this can be applied for "1:1" or "N:1" models for source and sink), and accumulate "perf_record_aux::aux_size" into "cs_etm_queue::buf_rec_len".

At the decoder side, it decreases "etmq->buf_rec_len" until to zero for the current round of decoding (see cs_etm__decode_data_block()). Since all the "PERF_RECORD_AUX" event will be processed before "PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be ignored.

The main reason for this suggestion is it don't need to change the significant logic in current code. I will try to do experiment for this idea and share back.

James, if you think I miss anything, please correct me as needed. Thanks!

This is an interesting idea, I think we could push decoded packets into the min heap as the aux records are received, and not do anything with them until the end of the data is reached. That way instead of saving aux records, we'd save the result of the decode for each aux record.

Currently each cs_etm_queue has a cs_etm_traceid_queue/cs_etm_packet_queue for each stream, but that would have to be changed to have multiple ones because multiple packets could be decoded to get through the whole aux record.

It would be a similarly sized change, and could also have a bigger impact on memory. So I'm not sure if it would help to reduce the changes, but it is possible.

Below change is still very coarse and I just did very basic testing for it, so didn't cover all cases; so simply use it to demonstrate the basic idea.

Before the event PERF_RECORD_AUX arrives, we don't decode any trace data. And after PERF_RECORD_AUX coming, the aux buffer size will be accumulated into the queue, and decode the trace data for the queue based on the accumulated buffer length.

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index b9c1d329a7f1..3bd5609b6de4 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -89,7 +89,7 @@ struct cs_etm_queue { u8 pending_timestamp; u64 offset; const unsigned char *buf;

size_t buf_len, buf_used;

size_t aux_buf_len, buf_len, buf_used; /* Conversion between traceID and index in traceid_queues array */ struct intlist *traceid_queues_list; struct cs_etm_traceid_queue **traceid_queues;

@@ -1085,6 +1085,7 @@ cs_etm__get_trace(struct cs_etm_queue *etmq) if (old_buffer) auxtrace_buffer__drop_data(old_buffer); etmq->buf_len = 0;
etmq->aux_buf_len = 0;
return 0; }
@@ -2052,6 +2053,7 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) etmq->offset += processed; etmq->buf_used += processed; etmq->buf_len -= processed;

etmq->aux_buf_len -= processed;

out: return ret; @@ -2177,7 +2179,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq) */ err = cs_etm__process_traceid_queue(etmq, tidq);
} while (etmq->buf_len);
} while (etmq->aux_buf_len > 0);
if (err == 0) /* Flush any remaining branch stack entries */ @@ -2216,6 +2218,27 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm, return 0; } +static void cs_etm__update_aux_buf_len(struct cs_etm_auxtrace *etm,
		      struct perf_record_aux *aux)
+{
unsigned int cs_queue_nr, queue_nr;

struct auxtrace_queue *queue;

struct cs_etm_queue *etmq;

if (!etm->heap.heap_cnt)
return;
/* Take the entry at the top of the min heap */

cs_queue_nr = etm->heap.heap_array[0].queue_nr;

queue_nr = TO_QUEUE_NR(cs_queue_nr);

queue = &etm->queues.queue_array[queue_nr];

etmq = queue->priv;

etmq->aux_buf_len += aux->aux_size;

Hi Leo,

If the aux_sizes are accumulated, then information about the boundaries between each one is lost. We need to reset the decoder at the boundary between each aux record.

I've attached a file that has this issue. I tried your patch and I still see "Invalid Sequence in packet" and other decode errors. If you try my patch you will see that there are separate decodes done in each aux trace buffer, and they are separated by a newline after each aux record. And the decode errors are fixed.

James

...

fprintf(stderr, "%s: aux_buf_len=%ld\n", __func__, etmq->aux_buf_len);

return;

+}

static int cs_etm__process_queues(struct cs_etm_auxtrace *etm) { int ret = 0; @@ -2272,6 +2295,9 @@ static int cs_etm__process_queues(struct cs_etm_auxtrace *etm) if (ret < 0) goto out;
if (etmq->aux_buf_len <= 0)
	goto out;
/*

No more auxtrace_buffers to process in this etmq, simply

move on to another entry in the auxtrace_heap.
@@ -2414,9 +2440,15 @@ static int cs_etm__process_event(struct perf_session *session, else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event);

fprintf(stderr, "%s: event->header.type=%d\n", __func__, event->header.type);

if (!etm->timeless_decoding &&
   event->header.type == PERF_RECORD_AUX)
   event->header.type == PERF_RECORD_AUX) {
fprintf(stderr, "%s: aux_size=%lld\n", __func__, event->aux.aux_size);
cs_etm__update_aux_buf_len(etm, &event->aux);
return cs_etm__process_queues(etm);
}
return 0; }

James Clark

12 Feb 12 Feb

2:45 p.m.

New subject: [PATCH 4/7] perf cs-etm: don't process queues until cs_etm__flush_events

To make sure processing happens in the correct order, queue processing shouldn't start until every aux queue has had its first timestamp found.

Now that we're only searching for timestamps within each aux record, we need to wait until all aux records are delivered before starting the processing.

Signed-off-by: James Clark james.clark@arm.com --- tools/perf/util/cs-etm.c | 4 ---- 1 file changed, 4 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 88b541b2a804..5ab037c2dabe 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -2398,10 +2398,6 @@ static int cs_etm__process_event(struct perf_session *session, else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) return cs_etm__process_switch_cpu_wide(etm, event);

- if (!etm->timeless_decoding && - event->header.type == PERF_RECORD_AUX) - return cs_etm__process_queues(etm); - return 0; }

-- 2.28.0

James Clark

2:45 p.m.

New subject: [PATCH 5/7] perf cs-etm: split decode by aux records.

The trace data between aux records is not continuous, so the decoder must be reset between each record to ensure that parsing happens correctly and without any early exits.

Signed-off-by: James Clark james.clark@arm.com --- tools/perf/util/cs-etm.c | 109 +++++++++++++++++++++++---------------- 1 file changed, 64 insertions(+), 45 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 5ab037c2dabe..3026fcf50b5d 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -95,6 +95,7 @@ struct cs_etm_queue { int aux_record_list_len; int aux_record_list_idx; struct perf_record_aux *aux_record_list; + bool timestamp_found; };

/* RB tree for quick conversion between traceID and metadata pointers */ @@ -784,6 +785,9 @@ static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq) unsigned int cs_queue_nr; u8 trace_chan_id;

+ if (etmq->timestamp_found) + return 0; + /* * We are under a CPU-wide trace scenario. As such we need to know * when the code that generated the traces started to execute so that @@ -792,56 +796,54 @@ static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq) * timestamp. The timestamp is then added to the auxtrace min heap * in order to know what nibble (of all the etmqs) to decode first. */ - while (1) { - /* - * Fetch an aux_buffer from this etmq. Bail if no more - * blocks or an error has been encountered. - */ - ret = cs_etm__get_data_block(etmq); - if (ret <= 0) - return ret; - - /* - * Run decoder on the trace block. The decoder will stop when - * encountering a timestamp, a full packet queue or the end of - * trace for that block. - */ - ret = cs_etm__decode_data_block(etmq); - if (ret) - return ret; + /* + * Fetch an aux_buffer from this etmq. Bail if no more + * blocks or an error has been encountered. + */ + ret = cs_etm__get_data_block(etmq); + if (ret <= 0) + return ret;

- /* - * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all - * the timestamp calculation for us. - */ - timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id); + /* + * Run decoder on the trace block. The decoder will stop when + * encountering a timestamp, a full packet queue or the end of + * trace for that block. + */ + ret = cs_etm__decode_data_block(etmq); + if (ret) + return ret;

- /* We found a timestamp, no need to continue. */ - if (timestamp) - break; + /* + * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all + * the timestamp calculation for us. + */ + timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id);

+ /* We found a timestamp, no need to continue. */ + if (timestamp) { /* - * We didn't find a timestamp so empty all the traceid packet - * queues before looking for another timestamp packet, either - * in the current data block or a new one. Packets that were - * just decoded are useless since no timestamp has been - * associated with them. As such simply discard them. + * We have a timestamp. Add it to the min heap to reflect when + * instructions conveyed by the range packets of this traceID queue + * started to execute. Once the same has been done for all the traceID + * queues of each etmq, redenring and decoding can start in + * chronological order. + * + * Note that packets decoded above are still in the traceID's packet + * queue and will be processed in cs_etm__process_queues(). */ - cs_etm__clear_all_packet_queues(etmq); + etmq->timestamp_found = true; + cs_queue_nr = TO_CS_QUEUE_NR(etmq->queue_nr, trace_chan_id); + return auxtrace_heap__add(&etmq->etm->heap, cs_queue_nr, timestamp); } - /* - * We have a timestamp. Add it to the min heap to reflect when - * instructions conveyed by the range packets of this traceID queue - * started to execute. Once the same has been done for all the traceID - * queues of each etmq, redenring and decoding can start in - * chronological order. - * - * Note that packets decoded above are still in the traceID's packet - * queue and will be processed in cs_etm__process_queues(). + * We didn't find a timestamp so empty all the traceid packet + * queues before looking for another timestamp packet, either + * in the current data block or a new one. Packets that were + * just decoded are useless since no timestamp has been + * associated with them. As such simply discard them. */ - cs_queue_nr = TO_CS_QUEUE_NR(etmq->queue_nr, trace_chan_id); - return auxtrace_heap__add(&etmq->etm->heap, cs_queue_nr, timestamp); + cs_etm__clear_all_packet_queues(etmq); + return 0; }

static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm, @@ -1645,6 +1647,13 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq) { int ret;

+ if (etmq->aux_record_list[etmq->aux_record_list_idx].aux_size <= 0) { + etmq->aux_record_list_idx++; + ret = cs_etm_decoder__reset(etmq->decoder); + if (ret) + return ret; + } + if (!etmq->buf_len) { ret = cs_etm__get_trace(etmq); if (ret <= 0) @@ -2016,6 +2025,15 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) { int ret = 0; size_t processed = 0; + u64 decode_size; + + if (etmq->aux_record_list_idx >= etmq->aux_record_list_len || + etmq->aux_record_list[etmq->aux_record_list_idx].aux_size > etmq->buf_len) { + // Assume that aux records always equally divide up the aux buffer + // so aux_size should never exceed the remaining buffer to decode. + ret = -1; + goto out; + }

/* * Packets are decoded and added to the decoder's packet queue @@ -2024,10 +2042,11 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) * operations that stop processing are a timestamp packet or a full * decoder buffer queue. */ + decode_size = etmq->aux_record_list[etmq->aux_record_list_idx].aux_size; ret = cs_etm_decoder__process_data_block(etmq->decoder, etmq->offset, &etmq->buf[etmq->buf_used], - etmq->buf_len, + decode_size, &processed); if (ret) goto out; @@ -2035,7 +2054,7 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq) etmq->offset += processed; etmq->buf_used += processed; etmq->buf_len -= processed; - + etmq->aux_record_list[etmq->aux_record_list_idx].aux_size -= processed; out: return ret; } @@ -2160,7 +2179,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq) */ err = cs_etm__process_traceid_queue(etmq, tidq);

- } while (etmq->buf_len); + } while (etmq->aux_record_list[etmq->aux_record_list_idx].aux_size > 0);

if (err == 0) /* Flush any remaining branch stack entries */

-- 2.28.0

James Clark

2:45 p.m.

New subject: [PATCH 6/7] perf cs-etm: Use existing decode code path for --dump-raw-trace

Previously the dump mode created a new decoder for each buffer and had a different decode loop to the main code paths.

This change uses more of the existing code path which tracks aux records and resets the decoder between each one. Unfortunately the decoder is quite noisy when being reset, so printing has to be suppressed around each call to reset.

Signed-off-by: James Clark james.clark@arm.com --- tools/perf/util/cs-etm.c | 91 ++++++++++++++++------------------------ 1 file changed, 36 insertions(+), 55 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 3026fcf50b5d..dc4885794859 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -474,56 +474,21 @@ static int cs_etm__init_decoder_params(struct cs_etm_decoder_params *d_params, return ret; }

-static void cs_etm__dump_event(struct cs_etm_auxtrace *etm, - struct auxtrace_buffer *buffer) +static void cs_etm__dump_event(struct cs_etm_queue *etmq) { - int ret; const char *color = PERF_COLOR_BLUE; - struct cs_etm_decoder_params d_params; - struct cs_etm_trace_params *t_params; - struct cs_etm_decoder *decoder; - size_t buffer_used = 0; + + if (etmq->aux_record_list_idx >= etmq->aux_record_list_len) + return;

fprintf(stdout, "\n"); color_fprintf(stdout, color, ". ... CoreSight ETM Trace data: size %zu bytes\n", - buffer->size); - - /* Use metadata to fill in trace parameters for trace decoder */ - t_params = zalloc(sizeof(*t_params) * etm->num_cpu); - - if (!t_params) - return; - - if (cs_etm__init_trace_params(t_params, etm)) - goto out_free; - - /* Set decoder parameters to simply print the trace packets */ - if (cs_etm__init_decoder_params(&d_params, NULL, - CS_ETM_OPERATION_PRINT)) - goto out_free; - - decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params); + etmq->aux_record_list[etmq->aux_record_list_idx].aux_size);

- if (!decoder) - goto out_free; - do { - size_t consumed; - - ret = cs_etm_decoder__process_data_block( - decoder, buffer->offset, - &((u8 *)buffer->data)[buffer_used], - buffer->size - buffer_used, &consumed); - if (ret) + while (1) + if (cs_etm__decode_data_block(etmq) <= 0) break; - - buffer_used += consumed; - } while (buffer_used < buffer->size); - - cs_etm_decoder__free(decoder); - -out_free: - zfree(&t_params); }

static int cs_etm__flush_events(struct perf_session *session, @@ -735,9 +700,15 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm) goto out_free;

/* Set decoder parameters to decode trace packets */ - if (cs_etm__init_decoder_params(&d_params, etmq, - CS_ETM_OPERATION_DECODE)) - goto out_free; + if (dump_trace) { + if (cs_etm__init_decoder_params(&d_params, NULL, + CS_ETM_OPERATION_PRINT)) + goto out_free; + } else { + if (cs_etm__init_decoder_params(&d_params, etmq, + CS_ETM_OPERATION_DECODE)) + goto out_free; + }

etmq->decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params);

@@ -903,7 +874,7 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct pe if (ret) return ret;

- if (!etm->timeless_decoding) + if (!etm->timeless_decoding && !dump_trace) return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv); else return 0; @@ -1647,6 +1618,9 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq) { int ret;

+ if (etmq->aux_record_list_idx >= etmq->aux_record_list_len) + return 0; + if (etmq->aux_record_list[etmq->aux_record_list_idx].aux_size <= 0) { etmq->aux_record_list_idx++; ret = cs_etm_decoder__reset(etmq->decoder); @@ -2387,10 +2361,7 @@ static int cs_etm__process_event(struct perf_session *session, struct cs_etm_auxtrace, auxtrace);

- if (dump_trace) - return 0; - - if (!tool->ordered_events) { + if (!tool->ordered_events && !dump_trace) { pr_err("CoreSight ETM Trace requires ordered events\n"); return -EINVAL; } @@ -2407,6 +2378,9 @@ static int cs_etm__process_event(struct perf_session *session, return err; }

+ if (dump_trace) + return 0; + if (etm->timeless_decoding && event->header.type == PERF_RECORD_EXIT) return cs_etm__process_timeless_queues(etm, @@ -2424,6 +2398,9 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session, union perf_event *event, struct perf_tool *tool __maybe_unused) { + unsigned int i; + struct auxtrace_queues *queues; + struct cs_etm_queue *etmq; struct cs_etm_auxtrace *etm = container_of(session->auxtrace, struct cs_etm_auxtrace, auxtrace); @@ -2446,12 +2423,17 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session, event, data_offset, &buffer); if (err) return err; + }

- if (dump_trace) - if (auxtrace_buffer__get_data(buffer, fd)) { - cs_etm__dump_event(etm, buffer); - auxtrace_buffer__put_data(buffer); + if (dump_trace) { + queues = &etm->queues; + for (i = 0; i < queues->nr_queues; i++) { + etmq = queues->queue_array[i].priv; + if (etmq) { + while (cs_etm__get_data_block(etmq) > 0) + cs_etm__dump_event(etmq); } + } }

return 0; @@ -2705,7 +2687,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,

if (dump_trace) { cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu); - return 0; }

if (session->itrace_synth_opts->set) {

-- 2.28.0

James Clark

2:45 p.m.

New subject: [PATCH 7/7] perf cs-etm: Suppress printing when resetting decoder

The decoder is quite noisy when being reset. Now that dump-raw-trace uses a code path that resets the decoder rather than creating a new one, printing has to be suppressed to not flood the output.

Signed-off-by: James Clark james.clark@arm.com --- tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c index 3f4bc4050477..e0d530d94e1e 100644 --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c @@ -32,6 +32,7 @@ struct cs_etm_decoder { void *data; void (*packet_printer)(const char *msg); + bool suppress_printing; dcd_tree_handle_t dcd_tree; cs_etm_mem_cb_type mem_access; ocsd_datapath_resp_t prev_return; @@ -71,9 +72,10 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder) ocsd_datapath_resp_t dp_ret;

decoder->prev_return = OCSD_RESP_CONT; - + decoder->suppress_printing = true; dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET, 0, 0, NULL, NULL); + decoder->suppress_printing = false; if (OCSD_DATA_RESP_IS_FATAL(dp_ret)) return -1;

@@ -143,8 +145,10 @@ static void cs_etm_decoder__print_str_cb(const void *p_context, const char *msg, const int str_len) { - if (p_context && str_len) - ((struct cs_etm_decoder *)p_context)->packet_printer(msg); + const struct cs_etm_decoder *decoder = p_context; + + if (p_context && str_len && !decoder->suppress_printing) + decoder->packet_printer(msg); }

static int

-- 2.28.0

Mathieu Poirier

24 Feb 24 Feb

4:13 p.m.

Good day James,

I have received your patchset and added it to my queue. On the flip side it will be 3 to 4 weeks (from today) before I get a chance to look at it. As such I suggest you don't wait on me before addressing the issues found by Leo.

Thanks, Mathieu

On Fri, Feb 12, 2021 at 04:45:06PM +0200, James Clark wrote:

...

Hi All,

Since my previous RFC, I've fixed --per-thread mode and solved most of the open questions. I've also changed --dump-raw-trace to use the same code path so it's also working now.

I think the only open questions are:

General approach

If aux records need to be saved, or if they can be pulled from elsewhere.

I've also tested perf inject which is now working with troublesome files.

Thanks James

James Clark (7): perf cs-etm: Split up etm queue setup function perf cs-etm: Only search timestamp in current sample's queue. perf cs-etm: Save aux records in each etm queue perf cs-etm: don't process queues until cs_etm__flush_events perf cs-etm: split decode by aux records. perf cs-etm: Use existing decode code path for --dump-raw-trace perf cs-etm: Suppress printing when resetting decoder

.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 +- tools/perf/util/cs-etm.c | 300 ++++++++++-------- 2 files changed, 168 insertions(+), 142 deletions(-)

-- 2.28.0

James Clark

1 Mar 1 Mar

2:05 p.m.

On 24/02/2021 18:13, Mathieu Poirier wrote:

...

Good day James,

I have received your patchset and added it to my queue. On the flip side it will be 3 to 4 weeks (from today) before I get a chance to look at it. As such I suggest you don't wait on me before addressing the issues found by Leo.

Ok, thanks Mathieu. I found that it's only working in --per-thread mode by coincidence of my input file. So I would suggest to not look too thoroughly until I have submitted v2. It should also probably still be an RFC rather than PATCH.

Thanks James

...

Thanks, Mathieu

On Fri, Feb 12, 2021 at 04:45:06PM +0200, James Clark wrote:

...
Hi All,

Since my previous RFC, I've fixed --per-thread mode and solved most of the open questions. I've also changed --dump-raw-trace to use the same code path so it's also working now.

I think the only open questions are:

General approach

If aux records need to be saved, or if they can be pulled from elsewhere.

I've also tested perf inject which is now working with troublesome files.

Thanks James

James Clark (7): perf cs-etm: Split up etm queue setup function perf cs-etm: Only search timestamp in current sample's queue. perf cs-etm: Save aux records in each etm queue perf cs-etm: don't process queues until cs_etm__flush_events perf cs-etm: split decode by aux records. perf cs-etm: Use existing decode code path for --dump-raw-trace perf cs-etm: Suppress printing when resetting decoder

.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 +- tools/perf/util/cs-etm.c | 300 ++++++++++-------- 2 files changed, 168 insertions(+), 142 deletions(-)

-- 2.28.0

Mathieu Poirier

15 Apr 15 Apr

8:37 p.m.

Hi James,

On Fri, Feb 12, 2021 at 04:45:06PM +0200, James Clark wrote:

...

Hi All,

Since my previous RFC, I've fixed --per-thread mode and solved most of the open questions. I've also changed --dump-raw-trace to use the same code path so it's also working now.

I think the only open questions are:

General approach

If aux records need to be saved, or if they can be pulled from elsewhere.

I've also tested perf inject which is now working with troublesome files.

This patchset is now at the top of the coresight review list. I suggest you rebase this on top of your two timestamp related patches and send another revision.

...

Thanks James

James Clark (7): perf cs-etm: Split up etm queue setup function perf cs-etm: Only search timestamp in current sample's queue. perf cs-etm: Save aux records in each etm queue perf cs-etm: don't process queues until cs_etm__flush_events perf cs-etm: split decode by aux records. perf cs-etm: Use existing decode code path for --dump-raw-trace perf cs-etm: Suppress printing when resetting decoder

.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 +- tools/perf/util/cs-etm.c | 300 ++++++++++-------- 2 files changed, 168 insertions(+), 142 deletions(-)

-- 2.28.0

1522

days inactive

1605

days old

coresight@lists.linaro.org

19 comments

participants

tags (0)

participants (3)

James Clark
Leo Yan
Mathieu Poirier