Hi,
The OCSD_INSTR_WFI_WFE instruction sub-type is added to the library
headers from version 0.11.0 of openCSD to support later ETMv4
versions.
This does require an update to the perf code in cs-etm-decoder.c to
add this value into the handling code in the default part of the case
statement - e.g.:-
case OCSD_INSTR_ISB:
case OCSD_INSTR_DSB_DMB:
+ case OCSD_INSTR_WFI_WFE:
case OCSD_INSTR_OTHER:
default:
The perf-opencsd master branch has not had an update to cover this yet.
The present perf decode does not use this value. If you do not
specifically need ETMv4.3 support for authenticated pointer trace then
it is safe to use the latest v0.10.x decoder with the perf-opencsd.
Otherwise you will need to patch locally till a patch is made
available in the repository, or the upstream perf supports the later
OpenCSD.
Regards
Mike
On Tue, 19 Mar 2019 at 07:46, Solomon <notifications(a)github.com> wrote:
>
> After installing the OpenCSD library, I tried to compile perf from perf-opencsd. I used the command make VF=1 -C tools/perf. However, I got the following error:
>
> CC util/intel-pt-decoder/intel-pt-log.o
>
> CC util/cs-etm-decoder/cs-etm-decoder.o
>
> util/cs-etm-decoder/cs-etm-decoder.c: In function ‘cs_etm_decoder__buffer_range’:
>
> util/cs-etm-decoder/cs-etm-decoder.c:370:2: error: enumeration value ‘OCSD_INSTR_WFI_WFE’ not handled in switch [-Werror=switch-enum]
>
> switch (elem->last_i_type) {
>
> ^~~~~~
>
> CC util/intel-pt-decoder/intel-pt-decoder.o
>
> cc1: all warnings being treated as errors
>
> Has anyone had the same issue before?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub, or mute the thread.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Minor update to library released - build fixes.
Fixes issue with Debian build on Sparc. See README for details.
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Hi Mathieu,
Apologies if mailman does not see this as a reply. I'm not sure if Outlook handles In-Reply-To properly.
I'm testing CPU-wide tracing on Zynq Ultrascale+ MPSoC and I have some comments I'd like to share.
Some introduction at first:
- I'm using mainline Linux from a couple of days ago (12ad143e1b80 Merge branch 'perf-urgent-for-linus'...)
- on top of it I have a couple of my changes introducing CoreSight support on US+
- on top of this I cherry-picked your two patch sets with CPU-wide tracing
I prepared a test program that's supposed to generate deterministic trace. I created a function that should,
depending on the argument, create either continuous E atoms or E/N atoms alternately. In main() I spawn
two threads with affinity attributes:
- the first thread is set up as atom E generator, pinned to CPU1
- the other as E/N generator, pinned to CPU2
The main thread is pinned to CPU0.
The atom generator function's body looks like below. If *atom == 'n', branch is not taken, thus atom N should
be generated, and if *atom == 'e', branch is taken and atom E should be generated. After that, another atom
E is expected, since the while loop branches back to the start. It's counter-intuitive when you look at the C code,
but the if-condition is actually evaluated to b.ne instruction, so getting inside the condition happens when the branch
is not taken.
volatile int sum = 0;
while (1) {
// Reference by pointer, so it's not optimized out.
if (*atom == 'n') // compiler creates b.ne here
sum += 0xdeadbeef * (*atom + 1);
}
Here are my observations:
1. -C option works well. I run perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u -C1 ./atom_gen
In perf report I can see lots of E atoms in packets marked with ID:12. If I collect trace with -C2 instead,
I see E/N atoms in packets with ID:14. Everything works as expected each time I trace this application.
2. -a option works unreliable. I run perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u -a ./atom_gen
What I expect is perf.data containing similar output to what I got with -C1 plus what I got with -C2, i.e. ID:12
Atom E packets and ID:14 atom E/N packets. What actually happens is inconsistent each time I try this command.
Sometimes I have no atom packets associated with IDs 12 and 14 but I have some with ID:16. Sometimes I get
ID:14 atoms but no ID:12. Sometimes I get expected trace but still some noise in ID:16 packets, which I would
not expect at all, since the program schedules nothing on CPU3. I wonder if I'm missing something here in my
understanding of CoreSight. Is this behaviour expected?
3. I'm not able to use filters.
I'd like to narrow down tracing to the while(1) loop in trace generator, to filter out noise from other instructions.
However, I find it impossible to use --filter flag along with -C or -a:
# perf record -e cs_etm/(a)fe940000.etf1/u --filter 'filter atom_n_generator @./atom_gen' -a ./atom_gen
failed to set filter "filter 0x90c/0x8c@/root/atom_gen" on event cs_etm/(a)fe940000.etf1/u with 95 (Operation not supported)
It works fine with --per-thread. Is the behaviour expected, or is this a bug?
4. Kernel crashes if used with no -a, -C or --per-thread.
If I call perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u ./atom_gen
I can see some printfs from the program, but immediately kernel gets NULL pointer dereference.
Please find a log below. My serial connection drops characters sometimes, sorry for that.
The crash happens in tmc_enable_etf_sink+0x90, which is:
/* Get a handle on the pid of the process to monitor */
if (handle->event->owner)
pid = task_pid_nr(handle->event->owner);
The handle->event->owner seems to be NULL.
[ 1313.650726Unable to handle kernel NULL pointer dereference at virtual adess 00000000000003b8
[ 1313.659501] Mem abort info:
[ 1313.662281] ESR = 0x96000006
[ 1313.665320] Exption class = DABT (current EL), IL = 32 bits
[ 1313.671232] SET = 0, FnV = 0
[ 1313.674277] EA = 0, S1PTW = 0
[ 1313.677401] Data abort info:
[ 1313.680266] ISV = 0, ISS =x00000006
[ 1313.684085] CM = 0, WnR = 0
[ 1313.687039] us pgtable: 4k pages, 39-bit VAs, pgdp = 000000003b61a770
[ 1313.693644] [00000000000003b8] pgd=000000006c6da003, pud=0000006c6da003, pmd=0000000000000000
[ 1313.702336] Internal err: Oops: 96000006 [#1] SMP
[ 1313.707201] Modules linked in:
[ 1313.710250] CPU: 1 PID: 3255 Comm: multithread-two N tainted 5.0.0-10411-g66431e6376c4-dirty #26
[ 1313.719200] Hdware name: ZynqMP ZCU104 RevA (DT)
[ 1313.723981] pstate: 20000085 (nzCv daIf -PAN -UAO)
[ 1313.728770] pc : tmc_enle_etf_sink+0x90/0x3b0
[ 1313.733286] lr : tmc_enable_etf_sin0x64/0x3b0
[ 1313.737806] sp : ffffff8011263b40
[ 1313.741104] x29: ffffff8011263b40 x28: 0000000000000000
[ 1313.6409] x27: 0000000000000000 x26: ffffffc06d4ce180
[ 1313.7512] x25: 0000000000000001 x24: ffffffc06faa4ce0
[ 1313.757015] x23: 0000000000000002 x22: 0000000000000080
[ 1313.7319] x21: ffffffc06faa4ce0 x20: ffffffc06cf07c00
[ 1313.7676] x19: ffffffc06d560e80 x18: 0000000000000000
[ 1313.772926] x17: 0000000000000000 x16: 0000000000000000
[ 1313.7729] x15: 0000000000000000 x14: ffffff8010879388
[ 1313.78353 x13: 0000000000000000 x12: 0000000002e8fc00
[ 1313.788836] x11: 0000000000000000 x10: 00000000000007f0
[ 1313.7940] x9 : 0000000000000000 x8 : 0000000000000000
[ 1313.799443x7 : 0000000000000030 x6 : ffffffc06c279030
[ 1313.804747] x5 : 0000000000000030 x4 : 0000000000000002
[ 1313.8100] x3 : ffffffc06d560ee8 x2 : 0000000000000001
[ 1313.815354]1 : 0000000000000000 x0 : 0000000000000000
[ 1313.820659] Process multithread-two (pid: 3255, stack limit = 0x00000073629f1e)
[ 1313.828133] Call trace:
[ 1313.830571] tmc_enae_etf_sink+0x90/0x3b0
[ 1313.834748] coresight_enable_path+0xe4/0x1f8
[ 1313.839096] etm_event_start+0x8c/0x120
[313.842923] etm_event_add+0x38/0x58
[ 1313.846492] event_scd_in.isra.61.part.62+0x94/0x1b0
[ 1313.851620] group_sched_in+0xa0/0x1c8
[ 1313.855360] flexible_sched_in+0xac/0x1
[ 1313.859364] visit_groups_merge+0x144/0x1f8
[ 1313.86353 ctx_sched_in.isra.39+0x128/0x138
[ 1313.867887] perf_event_sched_in.isra.41+0x54/0x80
[ 1313.872669] __perf_eventask_sched_in+0x16c/0x180
[ 1313.877540] finish_task_switch+104/0x1d8
[ 1313.881715] schedule_tail+0xc/0x98
[ 1313.885195] ret_from_fork+0x4/0x18
[ 1313.888677] Code: 540016 f9001bb7 f94002a0 f9414400 (b943b817)
[ 1313.894761] ---[ e trace 99bb09dc83a83a1a ]---
Best regards,
Wojciech
Hi,
(+coresight mailing lists.)
Looked at this - -fpic is supposed to generate smaller code then -fPIC.
That said, I've tried both variants for x86_64 and aarch64 builds:
x86_64 showed no change, (gcc 5.4)
cross compiled aarch64 code was 0.45% smaller using -fpic rather than
-fPIC. (gcc 6.2)
native compiled aarch64 code showed no change (gcc 4.9)
While we could add some code to the makefile to dynamically change the
-fPIC/pic option when building on sparc architectures, unless there are
objections on the mailing list, I propose to change to -fPIC across the
board at this point.
This will be released as a 0.11.1 patch (along with another minor build
fix.)
Regards
Mike
On Wed, 13 Mar 2019 at 08:50, John Paul Adrian Glaubitz <
notifications(a)github.com> wrote:
> I have just tested this on sparc64 and can confirm that replacing -fpic
> with -fPIC fixes the issue for me.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/Linaro/OpenCSD/issues/16#issuecomment-472332457>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AMvwsxbzERGcBbzJECyGHDUxn…>
> .
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Version v0.11.0 release of OpenCSD library.
* ETMv4 support updated to cover all ETM versions up to v4.4 (from
v4.1 previously). This covers the latest v8.4 arch cores.
* Updated memory callback function to pass trace ID to client when
requesting memory data.
Allows client to determine source CPU for the request and return
memory images accordingly.
(required for upcoming work on perf support.).
* Other minor fixes and updates.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK