Good day,
On Thu, 19 Mar 2020 at 08:28, Wojciech Żmuda wzmuda@n7space.com wrote:
Hello,
I'm exploring possibilities of tracing two concurrent programs pinned to two CPU cores, sharing the same sink. I tried to spawn two concurrent perf sessions but my results are not satisfying, so I wonder if such possibility exists at all.
# taskset -c 1 ./progA # taskset -c 2 ./progB # perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progA_pid # perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progB_pid
If progA and progB mostly sleep - I get trace data for both, which is fine. However If at least one of the programs gets more CPU-intensive (loops with arithmetic computations inside, no explicit sleeping nor waiting or IO), I get trace only for the more intensive one. If both are intensive, it seems random which one gets traced.
This observations suggests ETR buffer overflow. However, if CPU-intensive versions of progA and progB are scheduled on the same CPU - this method seem to work. I would expect ETR buffer being insufficient in this scenario as well.
All of the above is consistent with the reality of N:1 topologies and general kernel scheduling. First off in perf-thread mode a sink can only be used by a single process. If two processes are using the same sink, as your example does, trace results will be intermittent and based on which process gets scheduled. For example when process B is scheduled to run the common sink will be used to collect traces from that process alone. For as long as B is executing on a processor, A will not be allowed to use the sink and enabling the trace session when the event is installed on another CPU will simply fail. Process A will be allowed to use the common sink only when B is swapped off. Failing to enable an event when installed on a processor doesn't generate an error message because (1) we are in interrupt context, (2) doing so would introduce latency and (3) the amount of output would be too important. This is inherent to the perf core and completely separated from CS.
To enlarge the ETR buffer I experimented with the -m flag, but I'm unable to use more than -m,16M. Independent RSZ polling shows that this value gets programmed, but anything above 16 MB makes TMC-ETR driver complain that cma_alloc failed to get that amount of memory. Anyway, enlarging buffer to 16MB doesn't seem to affect my issue. With bigger buffer my observations are identical.
Not sure about the 16M limit, nothing on the CS front prevents using a bigger buffer. Regardless, this is a contention issue and allocating more memory won't help.
Those observations make me suspect that another technical obstacle might exists, beside possible buffer overflow.
I also tried with CPU-wide mode and it seem to work:
You are correct. Whether operating in per-thread or CPU wide mode, a sink will accept traces coming from a single sessionID. The sessionID is the PID of the process that _launched_ the trace command. As such when doing something like:
# perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progA_pid # perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progB_pid
That will create two distinct events, one per $progX_pid. Both events will carry a different sessionID because they were created by two different processes on the command line (perf record -e ...). When operating in CPU-wide scenarios things are different:
# perf record -e cs_etm/timestamp,@tmc_etr0/u -C 2,3
Here two events are created, one for each CPU. Because the originating process is the same (perf recored -e ...) the sessionID carried by the events will be similar.
I hope this helps, Mathieu
taskset -c 1 ./progA taskset -c 2 ./progB perf record -e cs_etm/timestamp,@tmc_etr0/u -C 2,3
but this approach is quite limited as filters don't work in CPU-wide mode and perf itself is also traced (which is weird, as I tried setting CPU affinity of perf-record with taskset as well - didn't help).
To wrap-up:
- Is it possible to trace two programs with two perf-record sessions at the same time, sharing a sink?
- Is it possible to enlarge TMC-ETR buffer above 16MB? I guess SG mode might be an option here, but as I can't really modify my kernel and DT right now. Perhaps there's a possibility to make the kernel allocator work past the 16MB boundary?
Thank you and best regards, Wojciech
PS Sorry I don't proceed with the Coresight@Zynq MPSoC support I started some time ago. My access to the board is limited recently and it's hard to proceed with kernel development remotely. I hope to get back to it soon. _______________________________________________ CoreSight mailing list CoreSight@lists.linaro.org https://lists.linaro.org/mailman/listinfo/coresight