Hi,
two years ago Leo posted a patch set including a "perf script" script for trace disassembly (arm-cs-trace-disasm.py):
https://lists.linaro.org/pipermail/coresight/2018-May/001325.html
I can't find this upstream - did it change name, or was there a reason it didn't make it?
This kind of thing is becoming more interesting as we are looking to give people better ways of processing large volumes of trace without using "perf inject" as an intermediate step.
Al IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Al,
On Thu, May 28, 2020 at 01:40:58PM +0000, Al Grant wrote:
Hi,
two years ago Leo posted a patch set including a "perf script" script for trace disassembly (arm-cs-trace-disasm.py):
https://lists.linaro.org/pipermail/coresight/2018-May/001325.html
I can't find this upstream - did it change name, or was there a reason it didn't make it?
No, the python script has not been merged into mainline kernel. Searched mailing list, at last year there have some discussion and had concern for this patch to work with CPU wide trace mode:
https://lists.linaro.org/pipermail/coresight/2019-May/002592.html
This kind of thing is becoming more interesting as we are looking to give people better ways of processing large volumes of trace without using "perf inject" as an intermediate step.
Agree, 'perf script' is alternative way rather than 'perf inject' for post analysis.
Thanks, Leo
I can't find this upstream - did it change name, or was there a reason it didn't make it?
No, the python script has not been merged into mainline kernel. Searched mailing list, at last year there have some discussion and had concern for this patch to work with CPU wide trace mode:
https://lists.linaro.org/pipermail/coresight/2019-May/002592.html
This kind of thing is becoming more interesting as we are looking to give people better ways of processing large volumes of trace without using "perf inject" as an intermediate step.
Agree, 'perf script' is alternative way rather than 'perf inject' for post analysis.
Ok thanks. I made a few changes to the script (see below) - perf gives us the dso name in the event, so we don't have to look up by address on every event, but we still need the mmap records to give us the base address for disassembly. There are still some issues with this - it would be much better if we could get perf to pass in the mmap events directly, instead of having to run a perf command from within the script (which gets it wrong if the "-i" option was used, or if the perf binary wasn't on the path).
Al
# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember # SPDX-License-Identifier: GPL-2.0 # # Tor Jeremiassen <tor at ti.com> is original author who wrote script # skeleton, Mathieu Poirier <mathieu.poirier at linaro.org> contributed # fixes for build-id and memory map; Leo Yan <leo.yan at linaro.org> # updated the packet parsing with new samples format.
from __future__ import print_function import os import sys import re from subprocess import * from optparse import OptionParser, make_option
# Command line parsing
option_list = [ # formatting options for the bottom entry of the stack make_option("-k", "--vmlinux", dest="vmlinux_name", help="Set path to vmlinux file"), make_option("-d", "--objdump", dest="objdump_name", help="Set path to objdump executable file"), make_option("-v", "--verbose", dest="verbose", action="store_true", default=False, help="Enable debugging log") ]
parser = OptionParser(option_list=option_list) (options, args) = parser.parse_args()
if (options.objdump_name == None): sys.exit("No objdump executable file specified - use -d or --objdump option")
# Initialize global dicts and regular expression
build_ids = dict() mmaps = dict() disasm_cache = dict() cpu_data = dict() disasm_re = re.compile("^\s*([0-9a-fA-F]+):") disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") cache_size = 32*1024
def parse_buildid(): global build_ids
buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)" buildid_re = re.compile(buildid_regex)
results = check_output(["perf", "buildid-list"]).split('\n'); for line in results: m = buildid_re.search(line) if (m == None): continue;
id_name = m.group(2) id_num = m.group(1)
if (id_name == "[kernel.kallsyms]") : append = "/kallsyms" elif (id_name == "[vdso]") : append = "/vdso" else: append = "/elf"
build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \ "/" + id_name + "/" + id_num + append; # Replace duplicate slash chars to single slash char build_ids[id_name] = build_ids[id_name].replace('//', '/', 1)
if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)): print("kallsyms cannot be used to dump assembler")
# Set vmlinux path to replace kallsyms file, if without buildid we still # can use vmlinux to prase kernel symbols if ((options.vmlinux_name != None)): build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
class DSO: """ Details about an executable or dynamic shared object """ def __init__(self, name, start, end): self.name = name self.start = start self.end = end if name in build_ids: self.image = build_ids[name] else: self.image = None
def contains(self, addr): return self.start <= addr and addr < self.end
def __str__(self): return "%x..%x %s" % (self.start, self.end, self.name)
def parse_mmap(): global mmaps
# Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2 mmap_regex = "PERF_RECORD_MMAP.* (-?[0-9]+)/[0-9]+: [(0x[0-9a-fA-F]+)((0x[0-9a-fA-F]+)).*:\s.*\s(\S*)" mmap_re = re.compile(mmap_regex)
results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n') for line in results: m = mmap_re.search(line) if (m != None): pid = int(m.group(1)) if (m.group(4) == '[kernel.kallsyms]_text'): dso = '[kernel.kallsyms]' else: dso = m.group(4) start = int(m.group(2),0) end = int(m.group(2),0) + int(m.group(3),0) if pid not in mmaps: mmaps[pid] = {} mmaps[pid][dso] = DSO(dso, start, end)
def find_dso_by_addr(pid, addr): global mmaps for name, dso in mmaps[pid].items(): if dso.contains(addr): return dso if pid != -1: return find_dso_by_addr(-1, addr) return None
def find_dso_by_name(pid, name): global mmaps if pid in mmaps and name in mmaps[pid]: return mmaps[pid][name] if pid != -1: return find_dso_by_name(-1, name)
def read_disam(pid, dso, start_addr, stop_addr): global mmaps global build_ids
if dso.name == "[kernel.kallsyms]": return "<in kernel>\n"
addr_range = str(start_addr) + ":" + str(stop_addr) + ":" + dso.name
# Don't let the cache get too big, clear it when it hits max size if (len(disasm_cache) > cache_size): disasm_cache.clear();
try: disasm_output = disasm_cache[addr_range]; except: if dso.image is not None: fname = dso.image else: sys.exit("cannot find symbol file for " + dso.name)
disasm = [ options.objdump_name, "-d", "-z", "--start-address="+format(start_addr,"#x"), "--stop-address="+format(stop_addr,"#x") ] if dso.start != 0: disasm += [ "--adjust-vma="+format(dso.start,"#x") ] disasm += [ fname ] disasm_output = check_output(disasm).split('\n') disasm_cache[addr_range] = disasm_output
return disasm_output
def dump_disam(pid, dso, start_addr, stop_addr): for line in read_disam(pid, dso, start_addr, stop_addr): m = disasm_func_re.search(line) if m is None: m = disasm_re.search(line) if m is None: continue print("\t" + line)
def dump_packet(sample): print("Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \ "pid: %d tid: %d period: %d time: %d }" % \ (sample['cpu'], sample['addr'], sample['phys_addr'], \ sample['ip'], sample['pid'], sample['tid'], \ sample['period'], sample['time']))
def trace_begin(): print('ARM CoreSight Trace Data Assembler Dump') parse_buildid() parse_mmap()
def trace_end(): print('End')
def trace_unhandled(event_name, context, event_fields_dict): print(' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())]))
def process_event(param_dict): global cache_size global options
if param_dict["ev_name"] != "branches": dump_packet(sample) #print(param_dict) #sys.exit() sample = param_dict["sample"] if (options.verbose == True): dump_packet(sample)
cpu = sample["cpu"] pid = sample["pid"] this_dso = None if "dso" in param_dict: dso_name = param_dict["dso"] this_dso = find_dso_by_name(pid, dso_name) else: dso_name = None
# If period doesn't equal to 1, this packet is for instruction sample # packet, we need drop this synthetic packet. if (sample['period'] != 1): print("Skip synthetic instruction sample") return
ip = sample["ip"] addr = sample["addr"]
# Initialize CPU data if it's empty, and directly return back # if this is the first tracing event for this CPU. if (cpu_data.get(str(cpu) + 'addr') == None): cpu_data[str(cpu) + 'addr'] = addr return
# The format for packet is: # # +------------+------------+------------+ # sample_prev: | addr | ip | cpu | # +------------+------------+------------+ # sample_next: | addr | ip | cpu | # +------------+------------+------------+ # # We need to combine the two continuous packets to get the instruction # range for sample_prev::cpu: # # [ sample_prev::addr .. sample_next::ip ] # # For this purose, sample_prev::addr is stored into cpu_data structure # and read back for 'start_addr' when the new packet comes, and we need # to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for # 'stop_addr' is for the sake of objdump so the final assembler dump can # include last instruction for sample_next::ip.
start_addr = cpu_data[str(cpu) + 'addr'] stop_addr = ip + 4
# Record for previous sample packet cpu_data[str(cpu) + 'addr'] = addr
# Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4 if (start_addr == 0 and stop_addr == 4): print("CPU%s: CS_ETM_TRACE_ON packet is inserted" % cpu) return
# If cannot find dso so cannot dump assembler, bail out if this_dso is None: print("Address range [ 0x%x .. 0x%x ]: failed to find dso" % (start_addr, stop_addr)) return if not this_dso.contains(start_addr): print("Address range [ 0x%x .. 0x%x ]: isn't in same dso" % (start_addr, stop_addr)) return
dump_disam(pid, this_dso, start_addr, stop_addr) IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Fri, May 29, 2020 at 11:00:23AM +0000, Al Grant wrote:
I can't find this upstream - did it change name, or was there a reason it didn't make it?
No, the python script has not been merged into mainline kernel. Searched mailing list, at last year there have some discussion and had concern for this patch to work with CPU wide trace mode:
https://lists.linaro.org/pipermail/coresight/2019-May/002592.html
This kind of thing is becoming more interesting as we are looking to give people better ways of processing large volumes of trace without using "perf inject" as an intermediate step.
Agree, 'perf script' is alternative way rather than 'perf inject' for post analysis.
Ok thanks. I made a few changes to the script (see below) - perf gives us the dso name in the event, so we don't have to look up by address on every event, but we still need the mmap records to give us the base address for disassembly. There are still some issues with this - it would be much better if we could get perf to pass in the mmap events directly, instead of having to run a perf command from within the script (which gets it wrong if the "-i" option was used, or if the perf binary wasn't on the path).
Thanks a lot for sharing back the script. Will find time to look into the requirement you suggested to pass mmap events from perf tool to script.
Thanks, Leo
P.s. Please expect I will move slowly in next month, so have added this in todo list and find spare time for this.
# arm-cs-trace-disasm.py: ARM CoreSight Trace Dump With Disassember # SPDX-License-Identifier: GPL-2.0 # # Tor Jeremiassen <tor at ti.com> is original author who wrote script # skeleton, Mathieu Poirier <mathieu.poirier at linaro.org> contributed # fixes for build-id and memory map; Leo Yan <leo.yan at linaro.org> # updated the packet parsing with new samples format.
from __future__ import print_function import os import sys import re from subprocess import * from optparse import OptionParser, make_option
# Command line parsing
option_list = [ # formatting options for the bottom entry of the stack make_option("-k", "--vmlinux", dest="vmlinux_name", help="Set path to vmlinux file"), make_option("-d", "--objdump", dest="objdump_name", help="Set path to objdump executable file"), make_option("-v", "--verbose", dest="verbose", action="store_true", default=False, help="Enable debugging log") ]
parser = OptionParser(option_list=option_list) (options, args) = parser.parse_args()
if (options.objdump_name == None): sys.exit("No objdump executable file specified - use -d or --objdump option")
# Initialize global dicts and regular expression
build_ids = dict() mmaps = dict() disasm_cache = dict() cpu_data = dict() disasm_re = re.compile("^\s*([0-9a-fA-F]+):") disasm_func_re = re.compile("^\s*([0-9a-fA-F]+)\s<.*>:") cache_size = 32*1024
def parse_buildid(): global build_ids
buildid_regex = "([a-fA-f0-9]+)[ \t]([^ \n]+)" buildid_re = re.compile(buildid_regex) results = check_output(["perf", "buildid-list"]).split('\n'); for line in results: m = buildid_re.search(line) if (m == None): continue; id_name = m.group(2) id_num = m.group(1) if (id_name == "[kernel.kallsyms]") : append = "/kallsyms" elif (id_name == "[vdso]") : append = "/vdso" else: append = "/elf" build_ids[id_name] = os.environ['PERF_BUILDID_DIR'] + \ "/" + id_name + "/" + id_num + append; # Replace duplicate slash chars to single slash char build_ids[id_name] = build_ids[id_name].replace('//', '/', 1) if ((options.vmlinux_name == None) and ("[kernel.kallsyms]" in build_ids)): print("kallsyms cannot be used to dump assembler") # Set vmlinux path to replace kallsyms file, if without buildid we still # can use vmlinux to prase kernel symbols if ((options.vmlinux_name != None)): build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
class DSO: """ Details about an executable or dynamic shared object """ def __init__(self, name, start, end): self.name = name self.start = start self.end = end if name in build_ids: self.image = build_ids[name] else: self.image = None
def contains(self, addr): return self.start <= addr and addr < self.end def __str__(self): return "%x..%x %s" % (self.start, self.end, self.name)
def parse_mmap(): global mmaps
# Check mmap for PERF_RECORD_MMAP and PERF_RECORD_MMAP2 mmap_regex = "PERF_RECORD_MMAP.* (-?[0-9]+)/[0-9]+: \[(0x[0-9a-fA-F]+)\((0x[0-9a-fA-F]+)\).*:\s.*\s(\S*)" mmap_re = re.compile(mmap_regex) results = check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP", shell=True).split('\n') for line in results: m = mmap_re.search(line) if (m != None): pid = int(m.group(1)) if (m.group(4) == '[kernel.kallsyms]_text'): dso = '[kernel.kallsyms]' else: dso = m.group(4) start = int(m.group(2),0) end = int(m.group(2),0) + int(m.group(3),0) if pid not in mmaps: mmaps[pid] = {} mmaps[pid][dso] = DSO(dso, start, end)
def find_dso_by_addr(pid, addr): global mmaps for name, dso in mmaps[pid].items(): if dso.contains(addr): return dso if pid != -1: return find_dso_by_addr(-1, addr) return None
def find_dso_by_name(pid, name): global mmaps if pid in mmaps and name in mmaps[pid]: return mmaps[pid][name] if pid != -1: return find_dso_by_name(-1, name)
def read_disam(pid, dso, start_addr, stop_addr): global mmaps global build_ids
if dso.name == "[kernel.kallsyms]": return "<in kernel>\n" addr_range = str(start_addr) + ":" + str(stop_addr) + ":" + dso.name # Don't let the cache get too big, clear it when it hits max size if (len(disasm_cache) > cache_size): disasm_cache.clear(); try: disasm_output = disasm_cache[addr_range]; except: if dso.image is not None: fname = dso.image else: sys.exit("cannot find symbol file for " + dso.name) disasm = [ options.objdump_name, "-d", "-z", "--start-address="+format(start_addr,"#x"), "--stop-address="+format(stop_addr,"#x") ] if dso.start != 0: disasm += [ "--adjust-vma="+format(dso.start,"#x") ] disasm += [ fname ] disasm_output = check_output(disasm).split('\n') disasm_cache[addr_range] = disasm_output return disasm_output
def dump_disam(pid, dso, start_addr, stop_addr): for line in read_disam(pid, dso, start_addr, stop_addr): m = disasm_func_re.search(line) if m is None: m = disasm_re.search(line) if m is None: continue print("\t" + line)
def dump_packet(sample): print("Packet = { cpu: 0x%d addr: 0x%x phys_addr: 0x%x ip: 0x%x " \ "pid: %d tid: %d period: %d time: %d }" % \ (sample['cpu'], sample['addr'], sample['phys_addr'], \ sample['ip'], sample['pid'], sample['tid'], \ sample['period'], sample['time']))
def trace_begin(): print('ARM CoreSight Trace Data Assembler Dump') parse_buildid() parse_mmap()
def trace_end(): print('End')
def trace_unhandled(event_name, context, event_fields_dict): print(' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())]))
def process_event(param_dict): global cache_size global options
if param_dict["ev_name"] != "branches": dump_packet(sample) #print(param_dict) #sys.exit() sample = param_dict["sample"] if (options.verbose == True): dump_packet(sample) cpu = sample["cpu"] pid = sample["pid"] this_dso = None if "dso" in param_dict: dso_name = param_dict["dso"] this_dso = find_dso_by_name(pid, dso_name) else: dso_name = None # If period doesn't equal to 1, this packet is for instruction sample # packet, we need drop this synthetic packet. if (sample['period'] != 1): print("Skip synthetic instruction sample") return ip = sample["ip"] addr = sample["addr"] # Initialize CPU data if it's empty, and directly return back # if this is the first tracing event for this CPU. if (cpu_data.get(str(cpu) + 'addr') == None): cpu_data[str(cpu) + 'addr'] = addr return # The format for packet is: # # +------------+------------+------------+ # sample_prev: | addr | ip | cpu | # +------------+------------+------------+ # sample_next: | addr | ip | cpu | # +------------+------------+------------+ # # We need to combine the two continuous packets to get the instruction # range for sample_prev::cpu: # # [ sample_prev::addr .. sample_next::ip ] # # For this purose, sample_prev::addr is stored into cpu_data structure # and read back for 'start_addr' when the new packet comes, and we need # to use sample_next::ip to calculate 'stop_addr', plusing extra 4 for # 'stop_addr' is for the sake of objdump so the final assembler dump can # include last instruction for sample_next::ip. start_addr = cpu_data[str(cpu) + 'addr'] stop_addr = ip + 4 # Record for previous sample packet cpu_data[str(cpu) + 'addr'] = addr # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4 if (start_addr == 0 and stop_addr == 4): print("CPU%s: CS_ETM_TRACE_ON packet is inserted" % cpu) return # If cannot find dso so cannot dump assembler, bail out if this_dso is None: print("Address range [ 0x%x .. 0x%x ]: failed to find dso" % (start_addr, stop_addr)) return if not this_dso.contains(start_addr): print("Address range [ 0x%x .. 0x%x ]: isn't in same dso" % (start_addr, stop_addr)) return dump_disam(pid, this_dso, start_addr, stop_addr)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.