Good morning,
On Thu, 14 Mar 2019 at 10:00, Wojciech Żmuda wzmuda@n7space.com wrote:
Hi Mathieu,
Apologies if mailman does not see this as a reply. I'm not sure if Outlook handles In-Reply-To properly.
I'm testing CPU-wide tracing on Zynq Ultrascale+ MPSoC and I have some comments I'd like to share. Some introduction at first:
- I'm using mainline Linux from a couple of days ago (12ad143e1b80 Merge branch 'perf-urgent-for-linus'...)
- on top of it I have a couple of my changes introducing CoreSight support on US+
- on top of this I cherry-picked your two patch sets with CPU-wide tracing
I prepared a test program that's supposed to generate deterministic trace. I created a function that should, depending on the argument, create either continuous E atoms or E/N atoms alternately. In main() I spawn two threads with affinity attributes:
- the first thread is set up as atom E generator, pinned to CPU1
- the other as E/N generator, pinned to CPU2
The main thread is pinned to CPU0.
The atom generator function's body looks like below. If *atom == 'n', branch is not taken, thus atom N should be generated, and if *atom == 'e', branch is taken and atom E should be generated. After that, another atom E is expected, since the while loop branches back to the start. It's counter-intuitive when you look at the C code, but the if-condition is actually evaluated to b.ne instruction, so getting inside the condition happens when the branch is not taken.
volatile int sum = 0; while (1) { // Reference by pointer, so it's not optimized out. if (*atom == 'n') // compiler creates b.ne here sum += 0xdeadbeef * (*atom + 1); }
Here are my observations:
- -C option works well. I run perf with:
# perf record -e cs_etm/@fe940000.etf1/u -C1 ./atom_gen
In perf report I can see lots of E atoms in packets marked with ID:12. If I collect trace with -C2 instead, I see E/N atoms in packets with ID:14. Everything works as expected each time I trace this application.
- -a option works unreliable. I run perf with:
# perf record -e cs_etm/@fe940000.etf1/u -a ./atom_gen
What I expect is perf.data containing similar output to what I got with -C1 plus what I got with -C2, i.e. ID:12 Atom E packets and ID:14 atom E/N packets. What actually happens is inconsistent each time I try this command. Sometimes I have no atom packets associated with IDs 12 and 14 but I have some with ID:16. Sometimes I get ID:14 atoms but no ID:12. Sometimes I get expected trace but still some noise in ID:16 packets, which I would not expect at all, since the program schedules nothing on CPU3. I wonder if I'm missing something here in my understanding of CoreSight. Is this behaviour expected?
- I'm not able to use filters.
I'd like to narrow down tracing to the while(1) loop in trace generator, to filter out noise from other instructions. However, I find it impossible to use --filter flag along with -C or -a:
# perf record -e cs_etm/@fe940000.etf1/u --filter 'filter atom_n_generator @./atom_gen' -a ./atom_gen failed to set filter "filter 0x90c/0x8c@/root/atom_gen" on event cs_etm/@fe940000.etf1/u with 95 (Operation not supported)
It works fine with --per-thread. Is the behaviour expected, or is this a bug?
I had time to look into this problem and found the answer in a comment in the code:
https://elixir.bootlin.com/linux/latest/source/kernel/events/core.c#L9012
Mathieu
- Kernel crashes if used with no -a, -C or --per-thread.
If I call perf with: # perf record -e cs_etm/@fe940000.etf1/u ./atom_gen
I can see some printfs from the program, but immediately kernel gets NULL pointer dereference. Please find a log below. My serial connection drops characters sometimes, sorry for that. The crash happens in tmc_enable_etf_sink+0x90, which is:
/* Get a handle on the pid of the process to monitor */ if (handle->event->owner) pid = task_pid_nr(handle->event->owner);
The handle->event->owner seems to be NULL.
[ 1313.650726Unable to handle kernel NULL pointer dereference at virtual adess 00000000000003b8 [ 1313.659501] Mem abort info: [ 1313.662281] ESR = 0x96000006 [ 1313.665320] Exption class = DABT (current EL), IL = 32 bits [ 1313.671232] SET = 0, FnV = 0 [ 1313.674277] EA = 0, S1PTW = 0 [ 1313.677401] Data abort info: [ 1313.680266] ISV = 0, ISS =x00000006 [ 1313.684085] CM = 0, WnR = 0 [ 1313.687039] us pgtable: 4k pages, 39-bit VAs, pgdp = 000000003b61a770 [ 1313.693644] [00000000000003b8] pgd=000000006c6da003, pud=0000006c6da003, pmd=0000000000000000 [ 1313.702336] Internal err: Oops: 96000006 [#1] SMP [ 1313.707201] Modules linked in: [ 1313.710250] CPU: 1 PID: 3255 Comm: multithread-two N tainted 5.0.0-10411-g66431e6376c4-dirty #26 [ 1313.719200] Hdware name: ZynqMP ZCU104 RevA (DT) [ 1313.723981] pstate: 20000085 (nzCv daIf -PAN -UAO) [ 1313.728770] pc : tmc_enle_etf_sink+0x90/0x3b0 [ 1313.733286] lr : tmc_enable_etf_sin0x64/0x3b0 [ 1313.737806] sp : ffffff8011263b40 [ 1313.741104] x29: ffffff8011263b40 x28: 0000000000000000 [ 1313.6409] x27: 0000000000000000 x26: ffffffc06d4ce180 [ 1313.7512] x25: 0000000000000001 x24: ffffffc06faa4ce0 [ 1313.757015] x23: 0000000000000002 x22: 0000000000000080 [ 1313.7319] x21: ffffffc06faa4ce0 x20: ffffffc06cf07c00 [ 1313.7676] x19: ffffffc06d560e80 x18: 0000000000000000 [ 1313.772926] x17: 0000000000000000 x16: 0000000000000000 [ 1313.7729] x15: 0000000000000000 x14: ffffff8010879388 [ 1313.78353 x13: 0000000000000000 x12: 0000000002e8fc00 [ 1313.788836] x11: 0000000000000000 x10: 00000000000007f0 [ 1313.7940] x9 : 0000000000000000 x8 : 0000000000000000 [ 1313.799443x7 : 0000000000000030 x6 : ffffffc06c279030 [ 1313.804747] x5 : 0000000000000030 x4 : 0000000000000002 [ 1313.8100] x3 : ffffffc06d560ee8 x2 : 0000000000000001 [ 1313.815354]1 : 0000000000000000 x0 : 0000000000000000 [ 1313.820659] Process multithread-two (pid: 3255, stack limit = 0x00000073629f1e) [ 1313.828133] Call trace: [ 1313.830571] tmc_enae_etf_sink+0x90/0x3b0 [ 1313.834748] coresight_enable_path+0xe4/0x1f8 [ 1313.839096] etm_event_start+0x8c/0x120 [313.842923] etm_event_add+0x38/0x58 [ 1313.846492] event_scd_in.isra.61.part.62+0x94/0x1b0 [ 1313.851620] group_sched_in+0xa0/0x1c8 [ 1313.855360] flexible_sched_in+0xac/0x1 [ 1313.859364] visit_groups_merge+0x144/0x1f8 [ 1313.86353 ctx_sched_in.isra.39+0x128/0x138 [ 1313.867887] perf_event_sched_in.isra.41+0x54/0x80 [ 1313.872669] __perf_eventask_sched_in+0x16c/0x180 [ 1313.877540] finish_task_switch+104/0x1d8 [ 1313.881715] schedule_tail+0xc/0x98 [ 1313.885195] ret_from_fork+0x4/0x18 [ 1313.888677] Code: 540016 f9001bb7 f94002a0 f9414400 (b943b817) [ 1313.894761] ---[ e trace 99bb09dc83a83a1a ]---
Best regards, Wojciech