Good day Leo,
On Sat, 30 Mar 2019 at 09:43, Leo Yan leo.yan@linaro.org wrote:
Hi Mathieu,
On Mon, Mar 25, 2019 at 03:56:29PM -0600, Mathieu Poirier wrote:
This patch uses the pid of the process being traced to aggregate traces coming from different processors in the same sink, something that is required when collecting traces in CPU-wide mode when the CoreSight HW enacts a N:1 source/sink topology.
I tried to use kprobe to verify the flow, so I observe something is not expected (Maybe I misunderstand it), please check the detailed info as below.
The testing steps are as below:
# perf record -e cs_etm/@f6404000.etr/ -a -- sleep 10 & # ls # ls # ls
So I expect the CoreSight to do global tracing for 10 seconds, and then simply execute 'ls' command for 3 times and these three processes will be traced into perf data.
Right.
I used kprobe event to add dynamic points for function tracing tmc_alloc_etr_buffer() and tmc_etr_get_etr_buf():
echo 'p:my_probe tmc_alloc_etr_buffer' > /sys/kernel/debug/tracing/kprobe_events echo 'p:my_probe2 0xffff000010ba63c4 pid=%x1:u32' >> kprobe_events
When start the 'perf record' command it will create etr_perf and etr_buf for every CPU, but afterwards for the three 'ls' processes, I cannot see any any ftrace log for them. So finally I capture the trace log as below, it only creates buffer for every CPU for one etr_perf and one etr_buf but has no any buffer is created for 'ls' processes.
That is the correct behavior. More specifically it creates an etr_perf for every event but the etr_buf is shared between those events because there is only one ETR device on your system.
# _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | perf-2502 [000] d... 2003.550571: my_probe: (tmc_alloc_etr_buffer+0x0/0x280) perf-2502 [000] d... 2003.550595: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502 perf-2502 [000] d... 2003.556306: my_probe: (tmc_alloc_etr_buffer+0x0/0x280) perf-2502 [000] d... 2003.556329: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502 perf-2502 [000] d... 2003.557694: my_probe: (tmc_alloc_etr_buffer+0x0/0x280) perf-2502 [000] d... 2003.557708: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502 perf-2502 [000] d... 2003.559079: my_probe: (tmc_alloc_etr_buffer+0x0/0x280) perf-2502 [000] d... 2003.559095: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502 perf-2502 [000] d... 2003.560567: my_probe: (tmc_alloc_etr_buffer+0x0/0x280) perf-2502 [000] d... 2003.560581: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502 perf-2502 [000] d... 2003.561954: my_probe: (tmc_alloc_etr_buffer+0x0/0x280) perf-2502 [000] d... 2003.561965: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502 perf-2502 [000] d... 2003.563338: my_probe: (tmc_alloc_etr_buffer+0x0/0x280) perf-2502 [000] d... 2003.563352: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502 perf-2502 [000] d... 2003.564782: my_probe: (tmc_alloc_etr_buffer+0x0/0x280) perf-2502 [000] d... 2003.564796: my_probe2: (tmc_alloc_etr_buffer+0xbc/0x280) pid=2502
Question: When every time execute 'ls' program, should we expect the tmc-etr driver to create etr_buf for every process?
Buffers are created at the beginning of the trace session for every CPU (as you reported) but after that everything that happens on those CPU is using the same buffers. As such you won't see buffers created for every program you execute.
If I use the command 'perf report --sort pid --stdio' to output result, I also can only see one process and doesn't have any samples for new created 'ls' processes:
# Samples: 168 of event 'branches' # Event count (approx.): 168 # # Children Self Pid:Command # ........ ........ ............... # 100.00% 100.00% 2502:perf
When working with an N:1 source/sink topology and doing a CPU-wide session, the first event to use the sink will switch it on and the last one will collect trace data. With a 10 second trace session it is very likely traces associated with the 'ls' processes have been overwritten. This is a problem inherent to the HW topology and there is nothing we can currently do to alleviate it.
Let me know if you need more information. Mathieu
Thanks, Leo Yan