New subject: [RFC 00/20] coresight: Add support for CPU-wide trace scenarios

14 Mar 2019


      Hi Mathieu,
Apologies if mailman does not see this as a reply. I'm not sure if Outlook handles In-Reply-To properly.
I'm testing CPU-wide tracing on Zynq Ultrascale+ MPSoC and I have some comments I'd like to share.
Some introduction at first:
- I'm using mainline Linux from a couple of days ago (12ad143e1b80 Merge branch 'perf-urgent-for-linus'...)
- on top of it I have a couple of my changes introducing CoreSight support on US+
- on top of this I cherry-picked your two patch sets with CPU-wide tracing
I prepared a test program that's supposed to generate deterministic trace. I created a function that should,
depending on the argument, create either continuous E atoms or E/N atoms alternately. In main() I spawn
two threads with affinity attributes:
- the first thread is set up as atom E generator, pinned to CPU1
- the other as E/N generator, pinned to CPU2
The main thread is pinned to CPU0.
The atom generator function's body looks like below. If *atom == 'n', branch is not taken, thus atom N should
be generated, and if *atom == 'e', branch is taken and atom E should be generated. After that, another atom
E is expected, since the while loop branches back to the start. It's counter-intuitive when you look at the C code,
but the if-condition is actually evaluated to b.ne instruction, so getting inside the condition happens when the branch
is not taken.
volatile int sum = 0;
  while (1) {
    // Reference by pointer, so it's not optimized out.
    if (*atom == 'n') // compiler creates b.ne here
      sum += 0xdeadbeef * (*atom + 1);
  }
Here are my observations:
1. -C option works well. I run perf with:
# perf record -e cs_etm/@fe940000.etf1/u -C1 ./atom_gen
In perf report I can see lots of E atoms in packets marked with ID:12. If I collect trace with -C2 instead,
I see E/N atoms in packets with ID:14. Everything works as expected each time I trace this application.
2. -a option works unreliable. I run perf with:
# perf record -e cs_etm/@fe940000.etf1/u -a ./atom_gen
What I expect is perf.data containing similar output to what I got with -C1 plus what I got with -C2, i.e. ID:12
Atom E packets and ID:14 atom E/N packets. What actually happens is inconsistent each time I try this command.
Sometimes I have no atom packets associated with IDs 12 and 14 but I have some with ID:16. Sometimes I get
ID:14 atoms but no ID:12. Sometimes I get expected trace but still some noise in ID:16 packets, which I would
not expect at all, since the program schedules nothing on CPU3. I wonder if I'm missing something here in my
understanding of CoreSight. Is this behaviour expected?
3. I'm not able to use filters.
I'd like to narrow down tracing to the while(1) loop in trace generator, to filter out noise from other instructions.
However, I find it impossible to use --filter flag along with -C or -a:
# perf record -e cs_etm/@fe940000.etf1/u --filter 'filter atom_n_generator @./atom_gen' -a ./atom_gen
failed to set filter "filter 0x90c/0x8c@/root/atom_gen" on event cs_etm/@fe940000.etf1/u with 95 (Operation not supported)
It works fine with --per-thread. Is the behaviour expected, or is this a bug?
4. Kernel crashes if used with no -a, -C or --per-thread.
If I call perf with:
# perf record -e cs_etm/@fe940000.etf1/u ./atom_gen
I can see some printfs from the program, but immediately kernel gets NULL pointer dereference.
Please find a log below. My serial connection drops characters sometimes, sorry for that.
The crash happens in tmc_enable_etf_sink+0x90, which is:
/* Get a handle on the pid of the process to monitor */
  if (handle->event->owner)
    pid = task_pid_nr(handle->event->owner);
The handle->event->owner seems to be NULL.
[ 1313.650726Unable to handle kernel NULL pointer dereference at virtual adess 00000000000003b8
[ 1313.659501] Mem abort info:
[ 1313.662281]   ESR = 0x96000006
[ 1313.665320]   Exption class = DABT (current EL), IL = 32 bits
[ 1313.671232] SET = 0, FnV = 0
[ 1313.674277]   EA = 0, S1PTW = 0
[ 1313.677401] Data abort info:
[ 1313.680266]   ISV = 0, ISS =x00000006
[ 1313.684085]   CM = 0, WnR = 0
[ 1313.687039] us pgtable: 4k pages, 39-bit VAs, pgdp = 000000003b61a770
[ 1313.693644] [00000000000003b8] pgd=000000006c6da003, pud=0000006c6da003, pmd=0000000000000000
[ 1313.702336] Internal err: Oops: 96000006 [#1] SMP
[ 1313.707201] Modules linked in:
[ 1313.710250] CPU: 1 PID: 3255 Comm: multithread-two N tainted 5.0.0-10411-g66431e6376c4-dirty #26
[ 1313.719200] Hdware name: ZynqMP ZCU104 RevA (DT)
[ 1313.723981] pstate: 20000085 (nzCv daIf -PAN -UAO)
[ 1313.728770] pc : tmc_enle_etf_sink+0x90/0x3b0
[ 1313.733286] lr : tmc_enable_etf_sin0x64/0x3b0
[ 1313.737806] sp : ffffff8011263b40
[ 1313.741104] x29: ffffff8011263b40 x28: 0000000000000000
[ 1313.6409] x27: 0000000000000000 x26: ffffffc06d4ce180
[ 1313.7512] x25: 0000000000000001 x24: ffffffc06faa4ce0
[ 1313.757015] x23: 0000000000000002 x22: 0000000000000080
[ 1313.7319] x21: ffffffc06faa4ce0 x20: ffffffc06cf07c00
[ 1313.7676] x19: ffffffc06d560e80 x18: 0000000000000000
[ 1313.772926] x17: 0000000000000000 x16: 0000000000000000
[ 1313.7729] x15: 0000000000000000 x14: ffffff8010879388
[ 1313.78353 x13: 0000000000000000 x12: 0000000002e8fc00
[ 1313.788836] x11: 0000000000000000 x10: 00000000000007f0
[ 1313.7940] x9 : 0000000000000000 x8 : 0000000000000000
[ 1313.799443x7 : 0000000000000030 x6 : ffffffc06c279030
[ 1313.804747] x5 : 0000000000000030 x4 : 0000000000000002
[ 1313.8100] x3 : ffffffc06d560ee8 x2 : 0000000000000001
[ 1313.815354]1 : 0000000000000000 x0 : 0000000000000000
[ 1313.820659] Process multithread-two (pid: 3255, stack limit = 0x00000073629f1e)
[ 1313.828133] Call trace:
[ 1313.830571]  tmc_enae_etf_sink+0x90/0x3b0
[ 1313.834748]  coresight_enable_path+0xe4/0x1f8
[ 1313.839096]  etm_event_start+0x8c/0x120
[313.842923]  etm_event_add+0x38/0x58
[ 1313.846492]  event_scd_in.isra.61.part.62+0x94/0x1b0
[ 1313.851620]  group_sched_in+0xa0/0x1c8
[ 1313.855360]  flexible_sched_in+0xac/0x1
[ 1313.859364]  visit_groups_merge+0x144/0x1f8
[ 1313.86353  ctx_sched_in.isra.39+0x128/0x138
[ 1313.867887]  perf_event_sched_in.isra.41+0x54/0x80
[ 1313.872669]  __perf_eventask_sched_in+0x16c/0x180
[ 1313.877540]  finish_task_switch+104/0x1d8
[ 1313.881715]  schedule_tail+0xc/0x98
[ 1313.885195]  ret_from_fork+0x4/0x18
[ 1313.888677] Code: 540016 f9001bb7 f94002a0 f9414400 (b943b817)
[ 1313.894761] ---[ e trace 99bb09dc83a83a1a ]---
Best regards,
Wojciech

Re: [RFC 00/20] coresight: Add support for CPU-wide trace scenarios