On Fri, 15 Mar 2019 at 12:19, Wojciech Żmuda wzmuda@n7space.com wrote:
Hello Mathieu,
I prepared a test program that's supposed to generate deterministic trace. I created a function that should, depending on the argument, create either continuous E atoms or E/N atoms alternately. In main() I spawn two threads with affinity attributes:
- the first thread is set up as atom E generator, pinned to CPU1
- the other as E/N generator, pinned to CPU2
The main thread is pinned to CPU0.
How do you pin threads to a CPU within a user space program?
The before calling pthread_create(), I specify attributes for each pthread:
pthread_attr_t attr; cpu_set_t cpus; pthread_attr_init(&attr); CPU_ZERO(&cpus); CPU_SET(core_id, &cpus); pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpus);
Then, a thread is spawned with &attr passed as a second argument to pthread_create(). I do CPU_ZERO(), CPU_SET() and pthread_attr_setaffinity_np() separately for each thread. I print return code of sched_getcpu() in the thread worker function, to prove threads are scheduled according to this code. It seems to work as desired.
- -C option works well. I run perf with:
# perf record -e cs_etm/@fe940000.etf1/u -C1 ./atom_gen
Note that this command does not exclusively trace the application atom_gen on CPU1. This command traces everything that is happening on CPU1 (in user space) for as long as application atom_gen is alive.
Thank you for pointing that out, that explains the inconsistent noise I sometimes observe. If we had filters working with -C, setting filter to some code range in ./atom_gen would result in tracing this application exclusively, correct?
Right, but as I pointed out in my previous email filtering isn't supported in CPU-wide trace scenarios.
If you want to trace atom_gen use either --per-thread or the "taskset" utility.
Actually I tried it at the very beginning, before you published the CPU-wide tracing feature. My observation was that with --per-thread I am not able to get any trace (besides some initialization packets) from multithreaded applications. Regardless how long my two atom generator threads were spinning - AUX sections of perf.data were nearly empty, with zero atom packets. Only single-threaded application tracing works well with --per-thread for me.
Do you have any suggestions on how should I trace multithreaded applications?
It is currently not possible to do multi-threaded applications. In fact it would be quite complex to implement and with topologies where CPUs use the same sink, it would be even harder. On top of things it would yield unpredictable results since traces could be overwritten because sinks don't generate an interrupt when full.
- -a option works unreliable. I run perf with:
# perf record -e cs_etm/@fe940000.etf1/u -a ./atom_gen
What I expect is perf.data containing similar output to what I got with -C1 plus what I got with -C2, i.e. ID:12 Atom E packets and ID:14 atom E/N packets. What actually happens is inconsistent each time I try this command. Sometimes I have no atom packets associated with IDs 12 and 14 but I have some with ID:16. Sometimes I get ID:14 atoms but no ID:12. Sometimes I get expected trace but still some noise in ID:16 packets, which I would not expect at all, since the program schedules nothing on CPU3. I wonder if I'm missing something here in my understanding of CoreSight. Is this behaviour expected?
On top of the explanation I have given above note that CPU-wide scenarios currently use the same sink. Because the memory buffer of the sink is limited it is easy to clobber traces from one CPU with traces from another. This is a problem inherent to topologies where sinks are shared between CPUs and unfortunately there is nothing to be done about it.
Can this issue be solved with TMC-ETR? I was under impression that, contrary to ETB that used dedicated SRAM as a trace buffer, TMC-ETR routes trace to system RAM. Is is possible to configure TMC-ETR to target some specific RAM region of an arbitrary size?
You are correct but the ETR will still use a single area to aggregate traces from several CPUs. The size of the AUX mmap'ed area can be changed using the "-m,x" option, where x is the number of pages to use but the end result is the same: it is highly possible that traces will get overwritten (same reason as above, i.e no interrupt when buffer is full).
Mathieu
Best regards, Wojciech