perf script for trace disassembly

List overview All Threads
Download

newer

older

[PATCH] Fix corrupt data after...

COVID-19: Face Masks & Hand...

Al Grant

28 May 2020 28 May '20

1:40 p.m.

Hi,

two years ago Leo posted a patch set including a "perf script" script for trace disassembly (arm-cs-trace-disasm.py):

https://lists.linaro.org/pipermail/coresight/2018-May/001325.html

I can't find this upstream - did it change name, or was there a reason it didn't make it?

This kind of thing is becoming more interesting as we are looking to give people better ways of processing large volumes of trace without using "perf inject" as an intermediate step.

Al IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Attachments:

attachment.html (text/html — 3.0 KB)

Show replies by date

Leo Yan

28 May 28 May

4 p.m.

Hi Al,

On Thu, May 28, 2020 at 01:40:58PM +0000, Al Grant wrote:

...

Hi,

two years ago Leo posted a patch set including a "perf script" script for trace disassembly (arm-cs-trace-disasm.py):

https://lists.linaro.org/pipermail/coresight/2018-May/001325.html

I can't find this upstream - did it change name, or was there a reason it didn't make it?

No, the python script has not been merged into mainline kernel. Searched mailing list, at last year there have some discussion and had concern for this patch to work with CPU wide trace mode:

https://lists.linaro.org/pipermail/coresight/2019-May/002592.html

...

This kind of thing is becoming more interesting as we are looking to give people better ways of processing large volumes of trace without using "perf inject" as an intermediate step.

Agree, 'perf script' is alternative way rather than 'perf inject' for post analysis.

Thanks, Leo

Al Grant

29 May 29 May

11 a.m.

...

...
I can't find this upstream - did it change name, or was there a reason it didn't make it?

No, the python script has not been merged into mainline kernel. Searched mailing list, at last year there have some discussion and had concern for this patch to work with CPU wide trace mode:

https://lists.linaro.org/pipermail/coresight/2019-May/002592.html

...
This kind of thing is becoming more interesting as we are looking to give people better ways of processing large volumes of trace without using "perf inject" as an intermediate step.

Agree, 'perf script' is alternative way rather than 'perf inject' for post analysis.

Ok thanks. I made a few changes to the script (see below) - perf gives us the dso name in the event, so we don't have to look up by address on every event, but we still need the mmap records to give us the base address for disassembly. There are still some issues with this - it would be much better if we could get perf to pass in the mmap events directly, instead of having to run a perf command from within the script (which gets it wrong if the "-i" option was used, or if the perf binary wasn't on the path).

# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember # SPDX-License-Identifier: GPL-2.0 # # Tor Jeremiassen <tor at ti.com> is original author who wrote script # skeleton, Mathieu Poirier <mathieu.poirier at linaro.org> contributed # fixes for build-id and memory map; Leo Yan <leo.yan at linaro.org> # updated the packet parsing with new samples format.

from __future__ import print_function import os import sys import re from subprocess import * from optparse import OptionParser, make_option

# Command line parsing

option_list = [ # formatting options for the bottom entry of the stack make_option("-k", "--vmlinux", dest="vmlinux_name", help="Set path to vmlinux file"), make_option("-d", "--objdump", dest="objdump_name", help="Set path to objdump executable file"), make_option("-v", "--verbose", dest="verbose", action="store_true", default=False, help="Enable debugging log") ]

parser = OptionParser(option_list=option_list) (options, args) = parser.parse_args()

if (options.objdump_name == None): sys.exit("No objdump executable file specified - use -d or --objdump option")

# Initialize global dicts and regular expression

build_ids = dict() mmaps = dict() disasm_cache = dict() cpu_data = dict() disasm_re = re.compile("^\s*([0-9a-fA-F]+):") disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") cache_size = 32*1024

def parse_buildid(): global build_ids

buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)" buildid_re = re.compile(buildid_regex)

results = check_output(["perf", "buildid-list"]).split('\n'); for line in results: m = buildid_re.search(line) if (m == None): continue;

id_name = m.group(2) id_num = m.group(1)

if (id_name == "[kernel.kallsyms]") : append = "/kallsyms" elif (id_name == "[vdso]") : append = "/vdso" else: append = "/elf"

build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \ "/" + id_name + "/" + id_num + append; # Replace duplicate slash chars to single slash char build_ids[id_name] = build_ids[id_name].replace('//', '/', 1)

if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)): print("kallsyms cannot be used to dump assembler")

# Set vmlinux path to replace kallsyms file, if without buildid we still # can use vmlinux to prase kernel symbols if ((options.vmlinux_name != None)): build_ids['[kernel.kallsyms]'] = options.vmlinux_name;

class DSO: """ Details about an executable or dynamic shared object """ def __init__(self, name, start, end): self.name = name self.start = start self.end = end if name in build_ids: self.image = build_ids[name] else: self.image = None

def contains(self, addr): return self.start <= addr and addr < self.end

def __str__(self): return "%x..%x %s" % (self.start, self.end, self.name)

def parse_mmap(): global mmaps

# Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2 mmap_regex = "PERF_RECORD_MMAP.* (-?[0-9]+)/[0-9]+: [(0x[0-9a-fA-F]+)((0x[0-9a-fA-F]+)).*:\s.*\s(\S*)" mmap_re = re.compile(mmap_regex)

results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n') for line in results: m = mmap_re.search(line) if (m != None): pid = int(m.group(1)) if (m.group(4) == '[kernel.kallsyms]_text'): dso = '[kernel.kallsyms]' else: dso = m.group(4) start = int(m.group(2),0) end = int(m.group(2),0) + int(m.group(3),0) if pid not in mmaps: mmaps[pid] = {} mmaps[pid][dso] = DSO(dso, start, end)

def find_dso_by_addr(pid, addr): global mmaps for name, dso in mmaps[pid].items(): if dso.contains(addr): return dso if pid != -1: return find_dso_by_addr(-1, addr) return None

def find_dso_by_name(pid, name): global mmaps if pid in mmaps and name in mmaps[pid]: return mmaps[pid][name] if pid != -1: return find_dso_by_name(-1, name)

def read_disam(pid, dso, start_addr, stop_addr): global mmaps global build_ids

if dso.name == "[kernel.kallsyms]": return "<in kernel>\n"

addr_range = str(start_addr) + ":" + str(stop_addr) + ":" + dso.name

# Don't let the cache get too big, clear it when it hits max size if (len(disasm_cache) > cache_size): disasm_cache.clear();

try: disasm_output = disasm_cache[addr_range]; except: if dso.image is not None: fname = dso.image else: sys.exit("cannot find symbol file for " + dso.name)

disasm = [ options.objdump_name, "-d", "-z", "--start-address="+format(start_addr,"#x"), "--stop-address="+format(stop_addr,"#x") ] if dso.start != 0: disasm += [ "--adjust-vma="+format(dso.start,"#x") ] disasm += [ fname ] disasm_output = check_output(disasm).split('\n') disasm_cache[addr_range] = disasm_output

return disasm_output

def dump_disam(pid, dso, start_addr, stop_addr): for line in read_disam(pid, dso, start_addr, stop_addr): m = disasm_func_re.search(line) if m is None: m = disasm_re.search(line) if m is None: continue print("\t" + line)

def dump_packet(sample): print("Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \ "pid: %d tid: %d period: %d time: %d }" % \ (sample['cpu'], sample['addr'], sample['phys_addr'], \ sample['ip'], sample['pid'], sample['tid'], \ sample['period'], sample['time']))

def trace_begin(): print('ARM CoreSight Trace Data Assembler Dump') parse_buildid() parse_mmap()

def trace_end(): print('End')

def trace_unhandled(event_name, context, event_fields_dict): print(' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())]))

def process_event(param_dict): global cache_size global options

if param_dict["ev_name"] != "branches": dump_packet(sample) #print(param_dict) #sys.exit() sample = param_dict["sample"] if (options.verbose == True): dump_packet(sample)

cpu = sample["cpu"] pid = sample["pid"] this_dso = None if "dso" in param_dict: dso_name = param_dict["dso"] this_dso = find_dso_by_name(pid, dso_name) else: dso_name = None

# If period doesn't equal to 1, this packet is for instruction sample # packet, we need drop this synthetic packet. if (sample['period'] != 1): print("Skip synthetic instruction sample") return

ip = sample["ip"] addr = sample["addr"]

# Initialize CPU data if it's empty, and directly return back # if this is the first tracing event for this CPU. if (cpu_data.get(str(cpu) + 'addr') == None): cpu_data[str(cpu) + 'addr'] = addr return

# The format for packet is: # # +------------+------------+------------+ # sample_prev: | addr | ip | cpu | # +------------+------------+------------+ # sample_next: | addr | ip | cpu | # +------------+------------+------------+ # # We need to combine the two continuous packets to get the instruction # range for sample_prev::cpu: # # [ sample_prev::addr .. sample_next::ip ] # # For this purose, sample_prev::addr is stored into cpu_data structure # and read back for 'start_addr' when the new packet comes, and we need # to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for # 'stop_addr' is for the sake of objdump so the final assembler dump can # include last instruction for sample_next::ip.

start_addr = cpu_data[str(cpu) + 'addr'] stop_addr = ip + 4

# Record for previous sample packet cpu_data[str(cpu) + 'addr'] = addr

# Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4 if (start_addr == 0 and stop_addr == 4): print("CPU%s: CS_ETM_TRACE_ON packet is inserted" % cpu) return

# If cannot find dso so cannot dump assembler, bail out if this_dso is None: print("Address range [ 0x%x .. 0x%x ]: failed to find dso" % (start_addr, stop_addr)) return if not this_dso.contains(start_addr): print("Address range [ 0x%x .. 0x%x ]: isn't in same dso" % (start_addr, stop_addr)) return

dump_disam(pid, this_dso, start_addr, stop_addr) IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Leo Yan

2 Jun 2 Jun

9:49 a.m.

On Fri, May 29, 2020 at 11:00:23AM +0000, Al Grant wrote:

...

...
...
I can't find this upstream - did it change name, or was there a reason it didn't make it?

No, the python script has not been merged into mainline kernel. Searched mailing list, at last year there have some discussion and had concern for this patch to work with CPU wide trace mode:

https://lists.linaro.org/pipermail/coresight/2019-May/002592.html

...
This kind of thing is becoming more interesting as we are looking to give people better ways of processing large volumes of trace without using "perf inject" as an intermediate step.

Agree, 'perf script' is alternative way rather than 'perf inject' for post analysis.

Ok thanks. I made a few changes to the script (see below) - perf gives us the dso name in the event, so we don't have to look up by address on every event, but we still need the mmap records to give us the base address for disassembly. There are still some issues with this - it would be much better if we could get perf to pass in the mmap events directly, instead of having to run a perf command from within the script (which gets it wrong if the "-i" option was used, or if the perf binary wasn't on the path).

Thanks a lot for sharing back the script. Will find time to look into the requirement you suggested to pass mmap events from perf tool to script.

Thanks, Leo

P.s. Please expect I will move slowly in next month, so have added this in todo list and find spare time for this.

...

# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember # SPDX-License-Identifier: GPL-2.0 # # Tor Jeremiassen <tor at ti.com> is original author who wrote script # skeleton, Mathieu Poirier <mathieu.poirier at linaro.org> contributed # fixes for build-id and memory map; Leo Yan <leo.yan at linaro.org> # updated the packet parsing with new samples format.

from __future__ import print_function import os import sys import re from subprocess import * from optparse import OptionParser, make_option

# Command line parsing

option_list = [ # formatting options for the bottom entry of the stack make_option("-k", "--vmlinux", dest="vmlinux_name", help="Set path to vmlinux file"), make_option("-d", "--objdump", dest="objdump_name", help="Set path to objdump executable file"), make_option("-v", "--verbose", dest="verbose", action="store_true", default=False, help="Enable debugging log") ]

parser = OptionParser(option_list=option_list) (options, args) = parser.parse_args()

if (options.objdump_name == None): sys.exit("No objdump executable file specified - use -d or --objdump option")

# Initialize global dicts and regular expression

build_ids = dict() mmaps = dict() disasm_cache = dict() cpu_data = dict() disasm_re = re.compile("^\s*([0-9a-fA-F]+):") disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") cache_size = 32*1024

def parse_buildid(): global build_ids
    buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)"
    buildid_re = re.compile(buildid_regex)

    results = check_output(["perf", "buildid-list"]).split('\n');
    for line in results:
            m = buildid_re.search(line)
            if (m == None):
                    continue;

            id_name = m.group(2)
            id_num  = m.group(1)

            if (id_name == "[kernel.kallsyms]") :
                    append = "/kallsyms"
            elif (id_name == "[vdso]") :
                    append = "/vdso"
            else:
                    append = "/elf"

            build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \
                                    "/" + id_name + "/" + id_num + append;
            # Replace duplicate slash chars to single slash char
            build_ids[id_name] = build_ids[id_name].replace('//', '/', 1)

    if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)):
            print("kallsyms cannot be used to dump assembler")

    # Set vmlinux path to replace kallsyms file, if without buildid we still
    # can use vmlinux to prase kernel symbols
    if ((options.vmlinux_name != None)):
            build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
class DSO: """ Details about an executable or dynamic shared object """ def __init__(self, name, start, end): self.name = name self.start = start self.end = end if name in build_ids: self.image = build_ids[name] else: self.image = None
    def contains(self, addr):
            return self.start <= addr and addr < self.end

    def __str__(self):
            return "%x..%x %s" % (self.start, self.end, self.name)
def parse_mmap(): global mmaps
    # Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2
    mmap_regex = "PERF_RECORD_MMAP.* (-?[0-9]+)/[0-9]+: \[(0x[0-9a-fA-F]+)\((0x[0-9a-fA-F]+)\).*:\s.*\s(\S*)"
    mmap_re = re.compile(mmap_regex)

    results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n')
    for line in results:
            m = mmap_re.search(line)
            if (m != None):
                    pid = int(m.group(1))
                    if (m.group(4) == '[kernel.kallsyms]_text'):
                            dso = '[kernel.kallsyms]'
                    else:
                            dso = m.group(4)
                    start = int(m.group(2),0)
                    end   = int(m.group(2),0) + int(m.group(3),0)
                    if pid not in mmaps:
                            mmaps[pid] = {}
                    mmaps[pid][dso] = DSO(dso, start, end)
def find_dso_by_addr(pid, addr): global mmaps for name, dso in mmaps[pid].items(): if dso.contains(addr): return dso if pid != -1: return find_dso_by_addr(-1, addr) return None

def find_dso_by_name(pid, name): global mmaps if pid in mmaps and name in mmaps[pid]: return mmaps[pid][name] if pid != -1: return find_dso_by_name(-1, name)

def read_disam(pid, dso, start_addr, stop_addr): global mmaps global build_ids
    if dso.name == "[kernel.kallsyms]":
            return "<in kernel>\n"

    addr_range = str(start_addr) + ":" + str(stop_addr) + ":" + dso.name

    # Don't let the cache get too big, clear it when it hits max size
    if (len(disasm_cache) > cache_size):
            disasm_cache.clear();

    try:
            disasm_output = disasm_cache[addr_range];
    except:
            if dso.image is not None:
                    fname = dso.image
            else:
                    sys.exit("cannot find symbol file for " + dso.name)

            disasm = [ options.objdump_name, "-d", "-z",
                       "--start-address="+format(start_addr,"#x"),
                       "--stop-address="+format(stop_addr,"#x") ]
            if dso.start != 0:
                    disasm += [ "--adjust-vma="+format(dso.start,"#x") ]
            disasm += [ fname ]
            disasm_output = check_output(disasm).split('\n')
            disasm_cache[addr_range] = disasm_output

    return disasm_output
def dump_disam(pid, dso, start_addr, stop_addr): for line in read_disam(pid, dso, start_addr, stop_addr): m = disasm_func_re.search(line) if m is None: m = disasm_re.search(line) if m is None: continue print("\t" + line)

def dump_packet(sample): print("Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \ "pid: %d tid: %d period: %d time: %d }" % \ (sample['cpu'], sample['addr'], sample['phys_addr'], \ sample['ip'], sample['pid'], sample['tid'], \ sample['period'], sample['time']))

def trace_begin(): print('ARM CoreSight Trace Data Assembler Dump') parse_buildid() parse_mmap()

def trace_end(): print('End')

def trace_unhandled(event_name, context, event_fields_dict): print(' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())]))

def process_event(param_dict): global cache_size global options
    if param_dict["ev_name"] != "branches":
            dump_packet(sample)
    #print(param_dict)
    #sys.exit()
    sample = param_dict["sample"]
    if (options.verbose == True):
            dump_packet(sample)

    cpu = sample["cpu"]
    pid = sample["pid"]
    this_dso = None
    if "dso" in param_dict:
            dso_name = param_dict["dso"]
            this_dso = find_dso_by_name(pid, dso_name)
    else:
            dso_name = None

    # If period doesn't equal to 1, this packet is for instruction sample
    # packet, we need drop this synthetic packet.
    if (sample['period'] != 1):
            print("Skip synthetic instruction sample")
            return

    ip = sample["ip"]
    addr = sample["addr"]

    # Initialize CPU data if it's empty, and directly return back
    # if this is the first tracing event for this CPU.
    if (cpu_data.get(str(cpu) + 'addr') == None):
            cpu_data[str(cpu) + 'addr'] = addr
            return

    # The format for packet is:
    #
    #                 +------------+------------+------------+
    #  sample_prev:   |    addr    |    ip      |    cpu     |
    #                 +------------+------------+------------+
    #  sample_next:   |    addr    |    ip      |    cpu     |
    #                 +------------+------------+------------+
    #
    # We need to combine the two continuous packets to get the instruction
    # range for sample_prev::cpu:
    #
    #     [ sample_prev::addr .. sample_next::ip ]
    #
    # For this purose, sample_prev::addr is stored into cpu_data structure
    # and read back for 'start_addr' when the new packet comes, and we need
    # to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for
    # 'stop_addr' is for the sake of objdump so the final assembler dump can
    # include last instruction for sample_next::ip.

    start_addr = cpu_data[str(cpu) + 'addr']
    stop_addr  = ip + 4

    # Record for previous sample packet
    cpu_data[str(cpu) + 'addr'] = addr

    # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
    if (start_addr == 0 and stop_addr == 4):
            print("CPU%s: CS_ETM_TRACE_ON packet is inserted" % cpu)
            return

    # If cannot find dso so cannot dump assembler, bail out
    if this_dso is None:
            print("Address range [ 0x%x .. 0x%x ]: failed to find dso" % (start_addr, stop_addr))
            return
    if not this_dso.contains(start_addr):
            print("Address range [ 0x%x .. 0x%x ]: isn't in same dso" % (start_addr, stop_addr))
            return

    dump_disam(pid, this_dso, start_addr, stop_addr)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

1904

days inactive

1909

days old

coresight@lists.linaro.org

3 comments

participants

tags (0)

participants (2)

Al Grant
Leo Yan