Hello
On 03/19/2020 02:28 PM, Wojciech Żmuda wrote:
Hello,
I'm exploring possibilities of tracing two concurrent programs pinned to two CPU cores, sharing the same sink. I tried to spawn two concurrent perf sessions but my results are not satisfying, so I wonder if such possibility exists at all.
# taskset -c 1 ./progA # taskset -c 2 ./progB # perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progA_pid # perf record -e cs_etm/timestamp,@tmc_etr0/u --filter "filter symbolA @./progA" --per-thread --pid $progB_pid
If progA and progB mostly sleep - I get trace data for both, which is fine. However If at least one of the programs gets more CPU-intensive (loops with arithmetic computations inside, no explicit sleeping nor waiting or IO), I get trace only for the more intensive one. If both are intensive, it seems random which one gets traced.
This observations suggests ETR buffer overflow. However, if CPU-intensive versions of progA and progB are scheduled on the same CPU - this method seem to work. I would expect ETR buffer being insufficient in this scenario as well.
To enlarge the ETR buffer I experimented with the -m flag, but I'm unable to use more than -m,16M. Independent RSZ polling shows that this value gets programmed, but anything above 16 MB makes TMC-ETR driver complain that cma_alloc failed to get that amount of memory. Anyway, enlarging buffer to 16MB doesn't seem to affect my issue. With bigger buffer my observations are identical.
Those observations make me suspect that another technical obstacle might exists, beside possible buffer overflow.
This is due to missing scatter-gather support. Adding "arm,scatter-gather" to the tmc-etr DT node will make the driver use SG mode. Please be aware that it is dangerous for some systems where the ETR is not integrated properly.
I also tried with CPU-wide mode and it seem to work:
taskset -c 1 ./progA taskset -c 2 ./progB perf record -e cs_etm/timestamp,@tmc_etr0/u -C 2,3
but this approach is quite limited as filters don't work in CPU-wide mode and perf itself is also traced (which is weird, as I tried setting CPU affinity of perf-record with taskset as well - didn't help).
To wrap-up:
- Is it possible to trace two programs with two perf-record sessions at the same time, sharing a sink?
Normally, no. I don't know if group-scheduling can help here.
- Is it possible to enlarge TMC-ETR buffer above 16MB? I guess SG mode might be an option here, but as I can't really modify my kernel and DT right now. Perhaps there's a possibility to make the kernel allocator work past the 16MB boundary?
Does it help to increase the CMA size by passing cma=64M or larger ?
Suzuki