A set of changes that came out of the issues reported here [1].
* First 2 patches fix a decode bug in Perf and add support for new
consistency checks in OpenCSD
* The remaining ones make the disassembly script easier to test
and use. This also involves adding a new Python binding to
Perf to get a config value (perf_config_get())
[1]: https://lore.kernel.org/linux-arm-kernel/20240719092619.274730-1-gankulkarn…
Changes since V1:
* Keep the flush function for discontinuities
* Still remove the flush when the buffer fills, but now add
cs_etm__end_block() for the end trace. That way we won't drop
the last branch stack if the instruction sample period wasn't
hit at the very end.
James Clark (7):
perf cs-etm: Don't flush when packet_queue fills up
perf cs-etm: Use new OpenCSD consistency checks
perf scripting python: Add function to get a config value
perf scripts python cs-etm: Update to use argparse
perf scripts python cs-etm: Improve arguments
perf scripts python cs-etm: Add start and stop arguments
perf test: cs-etm: Test Coresight disassembly script
.../perf/Documentation/perf-script-python.txt | 2 +-
.../scripts/python/Perf-Trace-Util/Context.c | 11 ++
.../scripts/python/arm-cs-trace-disasm.py | 109 +++++++++++++-----
.../tests/shell/test_arm_coresight_disasm.sh | 63 ++++++++++
tools/perf/util/config.c | 22 ++++
tools/perf/util/config.h | 1 +
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 7 +-
tools/perf/util/cs-etm.c | 25 ++--
8 files changed, 205 insertions(+), 35 deletions(-)
create mode 100755 tools/perf/tests/shell/test_arm_coresight_disasm.sh
--
2.34.1
A set of changes that came out of the issues reported here [1].
* First 2 patches fix a decode bug in Perf and add support for new
consistency checks in OpenCSD
* The remaining ones make the disassembly script easier to test
and use. This also involves adding a new Python binding to
Perf to get a config value (perf_config_get())
[1]: https://lore.kernel.org/linux-arm-kernel/20240719092619.274730-1-gankulkarn…
Changes since V2:
* Check validity of start stop arguments
* Make test work if Perf was installed
* Document that start and stop time are monotonic clock values
Changes since V1:
* Keep the flush function for discontinuities
* Still remove the flush when the buffer fills, but now add
cs_etm__end_block() for the end trace. That way we won't drop
the last branch stack if the instruction sample period wasn't
hit at the very end.
James Clark (7):
perf cs-etm: Don't flush when packet_queue fills up
perf cs-etm: Use new OpenCSD consistency checks
perf scripting python: Add function to get a config value
perf scripts python cs-etm: Update to use argparse
perf scripts python cs-etm: Improve arguments
perf scripts python cs-etm: Add start and stop arguments
perf test: cs-etm: Test Coresight disassembly script
.../perf/Documentation/perf-script-python.txt | 2 +-
.../scripts/python/Perf-Trace-Util/Context.c | 11 ++
.../scripts/python/arm-cs-trace-disasm.py | 127 ++++++++++++++----
.../tests/shell/test_arm_coresight_disasm.sh | 65 +++++++++
tools/perf/util/config.c | 22 +++
tools/perf/util/config.h | 1 +
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 7 +-
tools/perf/util/cs-etm.c | 25 +++-
8 files changed, 225 insertions(+), 35 deletions(-)
create mode 100755 tools/perf/tests/shell/test_arm_coresight_disasm.sh
--
2.34.1
On 13/09/2024 14:35, Leo Yan wrote:
>
>
> On 9/12/24 16:11, James Clark wrote:
>>
>> Run a few samples through the disassembly script and check to see that
>> at least one branch instruction is printed.
>>
>> Signed-off-by: James Clark <james.clark(a)linaro.org>
>> ---
>> .../tests/shell/test_arm_coresight_disasm.sh | 63 +++++++++++++++++++
>> 1 file changed, 63 insertions(+)
>> create mode 100755 tools/perf/tests/shell/test_arm_coresight_disasm.sh
>>
>> diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
>> b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
>> new file mode 100755
>> index 000000000000..6d004bf29f80
>> --- /dev/null
>> +++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
>> @@ -0,0 +1,63 @@
>> +#!/bin/sh
>> +# Check Arm CoreSight disassembly script completes without errors
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +# The disassembly script reconstructs ranges of instructions and
>> gives these to objdump to
>> +# decode. objdump doesn't like ranges that go backwards, but these
>> are a good indication
>> +# that decoding has gone wrong either in OpenCSD, Perf or in the
>> range reconstruction in
>> +# the script. Test all 3 parts are working correctly by running the
>> script.
>> +
>> +skip_if_no_cs_etm_event() {
>> + perf list | grep -q 'cs_etm//' && return 0
>> +
>> + # cs_etm event doesn't exist
>> + return 2
>> +}
>> +
>> +skip_if_no_cs_etm_event || exit 2
>> +
>> +# Assume an error unless we reach the very end
>> +set -e
>> +glb_err=1
>> +
>> +perfdata_dir=$(mktemp -d /tmp/__perf_test.perf.data.XXXXX)
>> +perfdata=${perfdata_dir}/perf.data
>> +file=$(mktemp /tmp/temporary_file.XXXXX)
>> +
>> +cleanup_files()
>> +{
>> + set +e
>> + rm -rf ${perfdata_dir}
>> + rm -f ${file}
>> + trap - EXIT TERM INT
>> + exit $glb_err
>> +}
>> +
>> +trap cleanup_files EXIT TERM INT
>> +
>> +# Ranges start and end on branches, so check for some likely branch
>> instructions
>> +sep="\s\|\s"
>> +branch_search="\sbl${sep}b${sep}b.ne${sep}b.eq${sep}cbz\s"
>> +
>> +## Test kernel ##
>> +if [ -e /proc/kcore ]; then
>> + echo "Testing kernel disassembly"
>> + perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file
>> > /dev/null 2>&1
>> + perf script -i ${perfdata} -s
>> python:tools/perf/scripts/python/arm-cs-trace-disasm.py -- \
>> + -d --stop-sample=30 2> /dev/null > ${file}
>
> This is fine for self test. But for a CI test in a distro, will it fail to
> find script with prefix 'tools/perf/...'?
>
> Thanks,
> Leo
>
Nice catch, it should be this:
# Relative path works whether it's installed or running from repo
script_path=$(dirname "$0")/../../scripts/python/arm-cs-trace\
-disasm.py
On 13/09/2024 12:54, Leo Yan wrote:
> On 9/12/24 16:11, James Clark wrote:>
>>
>> Previously when the incorrect binary was used for decode, Perf would
>> silently continue to generate incorrect samples. With OpenCSD 1.5.4 we
>> can enable consistency checks that do a best effort to detect a mismatch
>> in the image. When one is detected a warning is printed and sample
>> generation stops until the trace resynchronizes with a good part of the
>> image.
>>
>> Reported-by: Ganapatrao Kulkarni <gankulkarni(a)os.amperecomputing.com>
>> Closes:
>> https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.ampereco…
>> Signed-off-by: James Clark <james.clark(a)linaro.org>
>> ---
>> tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> index b78ef0262135..b85a8837bddc 100644
>> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
>> @@ -685,9 +685,14 @@ cs_etm_decoder__create_etm_decoder(struct
>> cs_etm_decoder_params *d_params,
>> }
>>
>> if (d_params->operation == CS_ETM_OPERATION_DECODE) {
>> + int decode_flags = OCSD_CREATE_FLG_FULL_DECODER;
>> +#ifdef OCSD_OPFLG_N_UNCOND_DIR_BR_CHK
>> + decode_flags |= OCSD_OPFLG_N_UNCOND_DIR_BR_CHK |
>> OCSD_OPFLG_CHK_RANGE_CONTINUE |
>> + ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK;
>> +#endif
>
> Looks good to me.
>
> Just one question: should the flag ETM4_OPFLG_PKTDEC_AA64_OPCODE_CHK be set
> according to ETM version? E.g. it should be only set for ETMv4 or this is
> fine for ETE as well.
>
> Thanks,
> Leo
>
I asked Mike the same question about ETMv3 and he said none of the flags
overlap and it was ok to always pass them. So I assume the same applies
to ETE as well.
Change since V3:
1. Use ^ete-[0-9]+$ for the pattern of node name -- comments from Rob Herring
Change since V2:
1. Change the name in binding as 'ete'.
Change since V1:
1. Remove the pattern match of ETE node name.
2. Update the tmc-etr node name in DT.
Mao Jinlong (2):
dt-bindings: arm: coresight: Update the pattern of ete node name
arm64: dts: qcom: sm8450: Add coresight nodes
.../arm/arm,embedded-trace-extension.yaml | 6 +-
arch/arm64/boot/dts/qcom/sm8450.dtsi | 726 ++++++++++++++++++
2 files changed, 729 insertions(+), 3 deletions(-)
--
2.46.0
On 11/09/2024 09:14, Leo Yan wrote:
>
>
> On 9/10/2024 9:28 PM, Leo Yan wrote:
>> On 9/5/2024 11:50 AM, James Clark wrote:
>>
>> [...]
>>
>>> cs_etm__flush(), like cs_etm__sample() is an operation that generates a
>>> sample and then swaps the current with the previous packet. Calling
>>> flush after processing the queues results in two swaps which corrupts
>>> the next sample. Therefore it wasn't appropriate to call flush here so
>>> remove it.
>>
>> In the cs_etm__sample(), if the period is not overflow, it is not necessarily
>> to generate instruction samples and copy back stack entries. This is why we
>> want to call cs_etm__flush() to make sure the last packet can be recorded
>> properly for instruction sample with back stacks.
>>
>> We also need to take account into the case for the end of the session - in
>> this case we need to generate samples for the last packet for complete info.
>>
>> I am wandering should we remove the cs_etm__packet_swap() from cs_etm__sample()?
>
> Sorry for typo. I meant to remove the cs_etm__packet_swap() from cs_etm__flush().
>
> Thanks,
> Leo
Turns out there was already cs_etm__end_block() for the end of the
session, but it was only called for the timeless modes. I added it for
timestamped mode too in V2.
I also kept the existing flush() function for discontinuities. I changed
my mind that the differences to cs_etm__sample() weren't relevant.
So I think we still need to keep the swap in flush() because otherwise
the next sample won't start from the right place.
Changes in V8:
- in arm-cs-trace-disasm.py, ensure map_pgoff is not converted to
string.
- Remove map_pgoff integer conversion in dso not found print
message.
Changes in V7:
- In arm-cs-trace-disasm.py, fix print message core dump resulting
from mixed type arithmetic.
- Modify CS_ETM_TRACE_ON filter to filter zero start_addr. The
CS_ETM_TRACE_ON message is changed to print only in verbose mode.
- Removed verbose mode only notification for start_addr/stop_addr
outside of dso address range.
Changes in V6:
- In arm-cs-trace-disasm.py, zero map_pgoff for kernel files. Add
map_pgoff to start/end address for dso not found message.
- Added "Reviewed-by" trailer for patches 1-3 previously reviewed
by Leo Yan in V4 and V5.
Changes in V5:
- In symbol-elf.c, branch to exit_close label if open file.
- In trace_event_python.c, correct indentation. set_sym_in_dict
call parameter "map_pgoff" renamed as "addr_map_pgoff" to
match local naming.
Changes in V4:
- In trace-event-python.c, fixed perf-tools-next merge problem.
Changes in V3:
- Rebased to linux-perf-tools branch.
- Squash symbol-elf.c and symbol.h into same commit.
- In map.c, merge dso__is_pie() call into existing if statement.
- In arm-cs-trace-disasm.py, remove debug artifacts.
Changes in V2:
- In dso__is_pie() (symbol-elf.c), Decrease indentation, add null pointer
checks per Leo Yan review.
- Updated mailing list distribution.
Steve Clevenger (4):
Add dso__is_pie call to identify ELF PIE
Force MAPPING_TYPE__IDENTIY for PIE
Add map pgoff to python dictionary based on MAPPING_TYPE
Adjust objdump start/end range per map pgoff parameter
.../scripts/python/arm-cs-trace-disasm.py | 17 ++++--
tools/perf/util/map.c | 4 +-
.../scripting-engines/trace-event-python.c | 13 +++-
tools/perf/util/symbol-elf.c | 61 +++++++++++++++++++
tools/perf/util/symbol.h | 1 +
5 files changed, 86 insertions(+), 10 deletions(-)
--
2.44.0
On 9/9/2024 1:33 PM, Leo Yan wrote:
> On 9/7/2024 12:20 AM, Steve Clevenger wrote:
>
> [...]
>
>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/
>>>>> perf/scripts/python/arm-cs-trace-disasm.py
>>>>> index 7aff02d84ffb..a867e0db02b8 100755
>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>> @@ -187,6 +187,7 @@ def process_event(param_dict):
>>>>> dso_start = get_optional(param_dict, "dso_map_start")
>>>>> dso_end = get_optional(param_dict, "dso_map_end")
>>>>> symbol = get_optional(param_dict, "symbol")
>>>>> + map_pgoff = get_optional(param_dict, "map_pgoff")
>>>>
>>>>
>>>> I am concerned the two sentences below are inconsistence: one uses
>>>> 'start_addr + map_pgoff' and the other uses 'start_addr + int(map_pgoff)'.
>>>>
>>>
>>> Valid point. It's working fine as is, but how is it even working? I look
>>> at print_disam/read_disasm, and no type conversion is done in either
>>> call. The dso_start parameter for these calls is actually dso_vm_start
>>> which is the dso_start integer conversion.
>
> I agreed with you. After reading code, I have same conclusion that we don't
> need to type conversion to int type.
>
>> Python thinks the map_pgoff object is already an integer. For example,
>> map_pgoff = get_optional(param_dict, "map_pgoff")
>> print("%d" % map_pgoff.isdigit())
>>
>> AttributeError: 'int' object has no attribute 'isdigit'
>> Fatal Python error: handler_call die: problem in Python trace event handler
>>
>> Converting map_pgoff to a string works in the print statement.
>>
>> print("%d" % str(map_pgoff).isdigit())
>> 1
>>
>> I'm not sure, but it's possible get_optional() during some call had
>> converted it to '[unknown]' which would cause a problem.
>>
>> I can explicitly force to integer.
>
> For backward compatibility, we can use below code:
>
> map_pgoff = get_optional(param_dict, "map_pgoff")
> if (map_pgoff == '[unknown]'):
> map_pgoff = 0
>
> Then in the later flow, we should can always use "map_pgoff" as an int type.
>
> I struggled for James reported issue.
>
> The variables “map_pgoff,” “dso_start,” and “dso_end” are set together in the
> Python engine. All of them should be of type int if the DSO is found, or all
> should be ‘[unknown]’ if the DSO is missing. We have checked the types for
> “dso_start” and “dso_end”, and if either is ‘[unknown]’ the flow will directly
> bail out. Thus, in theory, “map_pgoff” will not cause trouble if it is an
> ‘[unknown]’ string.
>
> One possibility is that James has applied your patches but has not built perf.
> So, the field “map_pgoff” is not passed from the Python engine, which will
> cause the reported error. I think the proposed above change can effectively
> avoid the error.
>
> Thanks,
> Leo
I added that exact'[unknown]' check in last week. There's no need to
explicitly convert to an integer although Python doesn't complain when
about integer to integer conversions. It's probably just a no op. I'll
commit the change later today.
Steve
On 9/6/2024 4:27 AM, Leo Yan wrote:
>
>
> On 9/5/24 23:28, Steve Clevenger wrote:
>>
>> Extract map_pgoff parameter from the dictionary, and adjust start/end
>> range passed to objdump based on the value.
>>
>> A zero start_addr is filtered to prevent output of dso address range
>> check failures. This script repeatedly sees a zero value passed
>> in for
>> start_addr = cpu_data[str(cpu) + 'addr']
>>
>> These zero values are not a new problem. The start_addr/stop_addr warning
>> clutters the instruction trace output, hence this change.
>>
>> Signed-off-by: Steve Clevenger <scclevenger(a)os.amperecomputing.com>
>> ---
>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 14 +++++++++-----
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/
>> perf/scripts/python/arm-cs-trace-disasm.py
>> index 7aff02d84ffb..a867e0db02b8 100755
>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> @@ -187,6 +187,7 @@ def process_event(param_dict):
>> dso_start = get_optional(param_dict, "dso_map_start")
>> dso_end = get_optional(param_dict, "dso_map_end")
>> symbol = get_optional(param_dict, "symbol")
>> + map_pgoff = get_optional(param_dict, "map_pgoff")
>
>
> I am concerned the two sentences below are inconsistence: one uses
> 'start_addr + map_pgoff' and the other uses 'start_addr + int(map_pgoff)'.
>
Valid point. It's working fine as is, but how is it even working? I look
at print_disam/read_disasm, and no type conversion is done in either
call. The dso_start parameter for these calls is actually dso_vm_start
which is the dso_start integer conversion.
> Here about below code?
>
> map_pgoff_str = get_optional(param_dict, "map_pgoff")
> if map_pgoff_str == "":
> map_pgoff = 0
> else:
> map_pgoff = int(map_pgoff_str)
>
> With above change, 'map_pgoff' is an int type. As a result, the changes
> below can simply add 'map_pgoff'.
I propose:
map_pgoff_str = get_optional(param_dict, "map_pgoff"
if (map_pgoff_str.isdigit()):
map_pgoff = int(map_pgoff)
else:
map_pgoff = 0
The int() conversions in the print() statement can then be removed.
Steve
> With these changes, LGTM:
>
> Reviewed-by: Leo Yan <leo.yan(a)arm.com>
>
>
>> cpu = sample["cpu"]
>> ip = sample["ip"]
>> @@ -243,9 +244,11 @@ def process_event(param_dict):
>> # Record for previous sample packet
>> cpu_data[str(cpu) + 'addr'] = addr
>>
>> - # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
>> - if (start_addr == 0 and stop_addr == 4):
>> - print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
>> + # Filter out zero start_address. Optionally identify
>> CS_ETM_TRACE_ON packet
>> + # if start_addr=0 and stop_addr=4.
>> + if (start_addr == 0):
>> + if ((stop_addr == 4) and (options.verbose == True)):
>> + print("CPU%d: CS_ETM_TRACE_ON packet is
>> inserted" % cpu)
>> return
>>
>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>> @@ -262,13 +265,14 @@ def process_event(param_dict):
>> # vm_start to zero.
>> if (dso == "[kernel.kallsyms]" or dso_start ==
>> 0x400000):
>> dso_vm_start = 0
>> + map_pgoff = 0
>> else:
>> dso_vm_start = int(dso_start)
>>
>> dso_fname = get_dso_file_path(dso, dso_bid)
>> if path.exists(dso_fname):
>> - print_disam(dso_fname, dso_vm_start,
>> start_addr, stop_addr)
>> + print_disam(dso_fname, dso_vm_start,
>> start_addr + map_pgoff, stop_addr + map_pgoff)
>> else:
>> - print("Failed to find dso %s for address range
>> [ 0x%x .. 0x%x ]" % (dso, start_addr, stop_addr))
>> + print("Failed to find dso %s for address range
>> [ 0x%x .. 0x%x ]" % (dso, start_addr + int(map_pgoff), stop_addr +
>> int(map_pgoff)))
>
>>
>> print_srccode(comm, param_dict, sample, symbol, dso)
>> --
>> 2.44.0
>>