Hi all,
I've hit some problems in trying to get perf record/report to work correctly on systems with separated debug images (Ubuntu, in particular). I worked on some patches to ensure that separate debug images do actually get loaded, when present -- these commits have now been merged in linux-2.6-tip/master. (See below for a list of the commits.)
Now that these are in place, I'm observing a new problem which can mess up symbol locations -- though I think there might be a practical workaround, I can't see a trivial solution, so I'm keen to hear if anyone has any ideas.
The problem:
perf makes some incorrect assumptions, which mean that the symbol locations seen by perf report and friends can be incorrect.
My analysis:
a) perf represents symbols as offsets from the start of the mmap'd code region which contains each symbol.
b) For shared libs (ET_DYN objects), the desired offset is (usually) equal to the VMA of each given symbol in the image. Perf assumes that these offsets are always equal, and this normally works.
Atypical link configurations could cause this to fail, but this would probably be unusual in the real world.
c) For executables (ET_EXEC objects), the desired offset is:
st_addr - (mmap_start + sh_offset - mmap_offset)
Where st_addr is the symbol VMA, sh_offset is the offset of the parent *section* (not segment) from the start of the image, mmap_start is the base VMA of the corresponding mmap'd *segment* (or part of a segment) in memory, and mmap_offset is the file offset at which the mapping starts (as in mmap(2))
(As a minor note, the pgoff field in the perf mmap event data (from which mmap_offset should be deducible) is probably also incorrect at present -- see snippet from util/event.c below.)
d) perf assumes that the above is equivalent to:
st_addr - (sh_addr - sh_offset)
Where sh_addr is the VMA of the start of the symbol's parent section.
This is true only given certain assumptions, namely that each symbol's parent section is mapped in such a way that mmap_offset is zero. In practice, this is true for the main executable segment of ET_EXEC images in typical link configurations (i.e., using the default GNU ld scripts).
e) The assumptions in (d) break down when using separated debug images, because although separated debug images contain faithful VMA information, the image is stripped of loadable sections' contents, resulting in radically different sh_offset values from the "real" image.
Because sh_offset bears no relationship to the layout of the data mmap'd at runtime, this produces bogus symbol adjustments which results in the wrong symbol offsets (a) being calculated.
Solutions:
(e) needs to be solved in order for perf report etc. to produce sensible output when using separated debug images.
I think that a comprehensive solution should also fix the assumptions in (b) and (d) so that any valid ELF images will be processed correctly.
Solving (e) doesn't appear trivial, since it requires section layout information from the image that was mmap'd at _runtime_ to be correlated with the symbol information collected maybe from other files at _report_ time. Currently, runtime and report time are treated rather separately, so this isn't straightforward to achieve.
A simple workaround in the meantime would be to assume that .text is mmap'd with mmap_offset=0 for ET_EXEC images, and fix the adjustment calculation appropriately. This is not a full fix but is probably worth doing if (as I guess) it gets the tools working correctly in the common case: see
[PATCH 2/2] perf symbols: work around incorrect ET_EXEC symbol adjustment
-------------------------------------------------------------------------------
My current list of patches in linux-2.6-tip/master:
commit 6da80ce8c43ddda153208cbb46b75290cf566fac perf symbols: Improve debug image search when loading symbols commit 8b1389ef93b36621c6acdeb623bd85aee3c405c9 perf tools: remove extra build-id check factored into dso__load commit 21916c380d93ab59d6d07ee198fb31c8f1338e26 perf tools: Factor out buildid reading and make it implicit in dso__load commit 88ca895dd4e0e64ebd942adb7925fa60ca5b2a98 perf tools: Remove unneeded code for tracking the cwd in perf sessions commit 361d13462585474267a0c41e956f1a1c19a93f17 perf report: Don't abbreviate file paths relative to the cwd
Illustrative code snippets (from v2.6.35):
util/symbol.c:
968:static int dso__load_sym(struct dso *self, struct map *map, const char *name, 969: int fd, symbol_filter_t filter, int kmodule, 970: int want_symtab) 971:{ [...] 1055: elf_symtab__for_each_symbol(syms, nr_syms, idx, sym) { 1133: if (curr_dso->adjust_symbols) { [...] 1134: pr_debug4("%s: adjusting symbol: st_value: %#Lx " 1135: "sh_addr: %#Lx sh_offset: %#Lx\n", __func__, 1136: (u64)sym.st_value, (u64)shdr.sh_addr, 1137: (u64)shdr.sh_offset); 1138: sym.st_value -= shdr.sh_addr - shdr.sh_offset; 1139: }
Probable incorrect symbol adjustment which currently gets used for ET_EXEC images. Explanation above.
util/event.c:
109:static int event__synthesize_mmap_events(pid_t pid, pid_t tgid, 110: event__handler_t process, 111: struct perf_session *session) 112:{ [...] 153: if (*pbf == 'x') { /* vm_exec */ [...] 166: /* pgoff is in bytes, not pages */ 167: if (n >= 0) 168: ev.mmap.pgoff = vm_pgoff << getpagesize(); 169: else 170: ev.mmap.pgoff = 0;
Unless I've misunderstood, getpagesize(2) returns the size in bytes of a page; in this case using it as a shift count is wrong.
Also, the code appears to be trying to convert a page count into an address for storage in the profile data. This would mean that when interpreting the data, knowledge of the page size is not needed. However, since I can't see any code which actually uses the pgoff information from the profile data, I'm not sure whether this was the intention.
In any case, the offset field from /proc/*/maps is already a byte count, not a page offset, so it does not look necessary to do convert it at all.
See [PATCH 1/2] perf events: Fix mmap offset determination.
Dave Martin (2): perf events: Fix mmap offset determination perf symbols: work around incorrect ET_EXEC symbol adjustment
tools/perf/util/event.c | 8 +------- tools/perf/util/symbol.c | 10 +++++++++- 2 files changed, 10 insertions(+), 8 deletions(-)
Fix buggy-looking code which unnecessarily adjusts the file offset fields read from /proc/*/maps.
This may have gone unnoticed since the offset is usually 0 (and the logic in util/symbol.c may work incorrectly for other offset values).
I make assumptions about the intended design here. The cover note accompanying this patch contains a more detailed explanation.
Signed-off-by: Dave Martin dave.martin@linaro.org --- tools/perf/util/event.c | 8 +------- 1 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index 6b0db55..db8a1d4 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -151,7 +151,6 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid, continue; pbf += n + 3; if (*pbf == 'x') { /* vm_exec */ - u64 vm_pgoff; char *execname = strchr(bf, '/');
/* Catch VDSO */ @@ -162,12 +161,7 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid, continue;
pbf += 3; - n = hex2u64(pbf, &vm_pgoff); - /* pgoff is in bytes, not pages */ - if (n >= 0) - ev.mmap.pgoff = vm_pgoff << getpagesize(); - else - ev.mmap.pgoff = 0; + n = hex2u64(pbf, &ev.mmap.pgoff);
size = strlen(execname); execname[size - 1] = '\0'; /* Remove \n */
Em Tue, Aug 03, 2010 at 12:48:35PM +0100, Dave Martin escreveu:
Fix buggy-looking code which unnecessarily adjusts the file offset fields read from /proc/*/maps.
This may have gone unnoticed since the offset is usually 0 (and the logic in util/symbol.c may work incorrectly for other offset values).
I make assumptions about the intended design here. The cover note accompanying this patch contains a more detailed explanation.
Signed-off-by: Dave Martin dave.martin@linaro.org
Doing some investigation here...
4af8b35d (Anton Blanchard 2010-04-03 164) pbf += 3; 4af8b35d (Anton Blanchard 2010-04-03 165) n = hex2u64(pbf, &vm_pgoff); 4af8b35d (Anton Blanchard 2010-04-03 166) /* pgoff is in bytes, not pages */ 4af8b35d (Anton Blanchard 2010-04-03 167) if (n >= 0) 4af8b35d (Anton Blanchard 2010-04-03 168) ev.mmap.pgoff = vm_pgoff << getpagesize(); 4af8b35d (Anton Blanchard 2010-04-03 169) else 4af8b35d (Anton Blanchard 2010-04-03 170) ev.mmap.pgoff = 0;
commit 4af8b35db6634dd1e0d616de689582b6c93550af Author: Anton Blanchard anton@samba.org Date: Sat Apr 3 22:53:31 2010 +1100
perf symbols: Fill in pgoff in mmap synthesized events
When we synthesize mmap events we need to fill in the pgoff field.
I wasn't able to test this completely since I couldn't find an executable region with a non 0 offset. We will see it when we start doing data profiling.
------------------------
Yeah, not reassuring comment, looking at fs/proc/task_mmu.c we see:
static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) { struct mm_struct *mm = vma->vm_mm; struct file *file = vma->vm_file; int flags = vma->vm_flags; unsigned long ino = 0; unsigned long long pgoff = 0; dev_t dev = 0; int len;
if (file) { struct inode *inode = vma->vm_file->f_path.dentry->d_inode; dev = inode->i_sb->s_dev; ino = inode->i_ino; pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; }
seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", vma->vm_start, vma->vm_end, flags & VM_READ ? 'r' : '-', flags & VM_WRITE ? 'w' : '-', flags & VM_EXEC ? 'x' : '-', flags & VM_MAYSHARE ? 's' : 'p', pgoff, MAJOR(dev), MINOR(dev), ino, &len);
----------------------------------------------------------------------------
So yes, we're double shifting that, your patch is correct, applying it.
tools/perf/util/event.c | 8 +------- 1 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index 6b0db55..db8a1d4 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -151,7 +151,6 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid, continue; pbf += n + 3; if (*pbf == 'x') { /* vm_exec */
u64 vm_pgoff; char *execname = strchr(bf, '/');
/* Catch VDSO */ @@ -162,12 +161,7 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid, continue; pbf += 3;
n = hex2u64(pbf, &vm_pgoff);
/* pgoff is in bytes, not pages */
if (n >= 0)
ev.mmap.pgoff = vm_pgoff << getpagesize();
else
ev.mmap.pgoff = 0;
n = hex2u64(pbf, &ev.mmap.pgoff);
size = strlen(execname); execname[size - 1] = '\0'; /* Remove \n */ -- 1.7.0.4
Workaround to adjust .text symbols from ET_EXEC images correctly.
This is not a complete fix, but addresses the common case and does not create additional assumptions beyond those which perf already makes. ELF images with a weird link map may still be processed incorrectly (as at present) -- that will require a separate fix.
The workaround allows correct symbol offsets to be generated for ET_EXEC executables on systems with separated debug info (where section sh_offset fields from the debug image may bear no relation to the layout of the executable image).
The cover note accompanying this patch contains a more detailed explanation.
Signed-off-by: Dave Martin dave.martin@linaro.org --- tools/perf/util/symbol.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 3b8c005..3930398 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -1135,7 +1135,15 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name, "sh_addr: %#Lx sh_offset: %#Lx\n", __func__, (u64)sym.st_value, (u64)shdr.sh_addr, (u64)shdr.sh_offset); - sym.st_value -= shdr.sh_addr - shdr.sh_offset; + /* Assumptions: + * a) shdr.sh_addr - shdr.sh_offset == + * map->start - map->pgoff + * b) map->pgoff == 0 + * These are true iff we are looking at a function + * symbol in the main executable segment _and_ + * the main executable segment starts at the start of + * the ELF image (normally true). */ + sym.st_value -= map->start; } /* * We need to figure out if the object was created from C++ sources
Applies to 2.6.35 (git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6.git perf/core branch)
The new patch loads the ELF section headers from a separate file if necessary, to avoid getting confused by the different section file offsets seen in debug images. Invalid section headers are detected by checking for the presence of non- writable SHT_NOBITS sections, which don't make sense under normal circumstances.
In particular, this allows symbols in ET_EXEC images to get fixed up correctly in the presence of separated debug images.
The image search loop is also tidied up to fix a minor bug which would perform the same image load attempt more than once in some situations.
For non-separated images, the headers should get loaded from the same image as the symbols, in the usual way.
Signed-off-by: Dave Martin dave.martin@linaro.org --- tools/perf/util/symbol.c | 185 +++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 173 insertions(+), 12 deletions(-)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 1a36773..21ae9df 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -1,4 +1,5 @@ #define _GNU_SOURCE +#include <assert.h> #include <ctype.h> #include <dirent.h> #include <errno.h> @@ -978,8 +979,106 @@ static size_t elf_addr_to_index(Elf *elf, GElf_Addr addr) return -1; }
+/** + * Read all section headers, copying them into a separate array so they survive + * elf_end. + * + * @elf: the libelf instance to operate on. + * @ehdr: the elf header: this must already have been read with gelf_getehdr(). + * @count: the number of headers read is assigned to *count on successful + * return. count must not be NULL. + * + * Returns a pointer to the allocated headers, which should be deallocated with + * free() when no longer needed. + */ +static GElf_Shdr *elf_get_all_shdrs(Elf *elf, GElf_Ehdr const *ehdr, + unsigned *count) +{ + GElf_Shdr *shdrs; + Elf_Scn *scn; + unsigned max_index = 0; + unsigned i; + + shdrs = malloc(ehdr->e_shnum * sizeof *shdrs); + if (!shdrs) + return NULL; + + for (i = 0; i < ehdr->e_shnum; i++) + shdrs[i].sh_type = SHT_NULL; + + for (scn = NULL; (scn = elf_nextscn(elf, scn)); ) { + size_t j; + + /* + * Just assuming we get section 0, 1, ... in sequence may lead + * to wrong section indices. Check the index explicitly: + */ + j = elf_ndxscn(scn); + assert(j < ehdr->e_shnum); + + if (j > max_index) + max_index = j; + + if (!gelf_getshdr(scn, &shdrs[j])) + goto error; + } + + *count = max_index + 1; + return shdrs; + +error: + free(shdrs); + return NULL; +} + +/** + * Check that the section headers @shdrs reflect accurately the file data + * layout of the image that was loaded during perf record. This is generally + * not true for separated debug images generated with e.g., + * objcopy --only-keep-debug. + * + * We identify invalid files by checking for non-empty sections which are + * declared as having no file data (SHT_NOBITS) but are not writable. + * + * @shdrs: the full set of section headers, as loaded by elf_get_all_shdrs(). + * @count: the number of headers present in @shdrs. + * + * Returns 1 for valid headers, 0 otherwise. + */ +static int elf_check_shdrs_valid(GElf_Shdr const *shdrs, unsigned count) +{ + unsigned i; + + for (i = 0; i < count; i++) { + if (shdrs[i].sh_type == SHT_NOBITS && + !(shdrs[i].sh_flags & SHF_WRITE) && + shdrs[i].sh_size != 0) + return 0; + } + + return 1; +} + +/* + * Notes: + * + * If saved_shdrs is non-NULL, the section headers will be read if found, and + * will be used for address fixups. saved_shdrs_count must also be non-NULL in + * this case. This may be needed for separated debug images, since the section + * headers and symbols may need to come from separate images in that case. + * + * Note: irrespective of whether this function returns successfully, + * *saved_shdrs may get initialised if saved_shdrs is non-NULL. It is the + * caller's responsibility to free() it when non longer needed. + * + * If want_symtab == 1, this function will only load symbols from .symtab + * sections. Otherwise (want_symtab == 0), .dynsym or .symtab symbols are + * loaded. This feature is used by dso__load() to search for the best image + * to load. + */ static int dso__load_sym(struct dso *self, struct map *map, const char *name, int fd, symbol_filter_t filter, int kmodule, + GElf_Shdr **saved_shdrs, unsigned *saved_shdrs_count, int want_symtab) { struct kmap *kmap = self->kernel ? map__kmap(map) : NULL; @@ -998,6 +1097,9 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name, int nr = 0; size_t opdidx = 0;
+ if (saved_shdrs != NULL) + assert(saved_shdrs_count != NULL); + elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL); if (elf == NULL) { pr_debug("%s: cannot read %s ELF file.\n", __func__, name); @@ -1021,6 +1123,34 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name, goto out_elf_end; }
+ /* + * Copy all section headers from the image if requested and if not + * already loaded. + */ + if (saved_shdrs != NULL && *saved_shdrs == NULL) { + GElf_Shdr *shdrs; + unsigned count; + + shdrs = elf_get_all_shdrs(elf, &ehdr, &count); + if (shdrs == NULL) + goto out_elf_end; + + /* + * Only keep the headers if they reflect the actual run-time + * image's file layout: + */ + if (elf_check_shdrs_valid(shdrs, count)) { + *saved_shdrs = shdrs; + *saved_shdrs_count = count; + } else + free(shdrs); + } + + /* If no genuine ELF headers are available yet, give up: we can't + * adjust symbols correctly in that case: */ + if (saved_shdrs != NULL && *saved_shdrs == NULL) + goto out_elf_end; + sec = elf_section_by_name(elf, &ehdr, &shdr, ".symtab", NULL); if (sec == NULL) { if (want_symtab) @@ -1031,6 +1161,9 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name, goto out_elf_end; }
+ if (saved_shdrs != NULL && *saved_shdrs == NULL) + goto out_elf_end; + opdsec = elf_section_by_name(elf, &ehdr, &opdshdr, ".opd", &opdidx); if (opdsec) opddata = elf_rawdata(opdsec, NULL); @@ -1153,12 +1286,25 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name, goto new_symbol; }
+ /* + * Currently, symbols for shared objects and PIE executables + * (i.e., ET_DYN) do not seem to get adjusted. This might need + * to change if file offset == virtual address is not actually + * guaranteed for these images. ELF doesn't provide this + * guarantee natively. + */ if (curr_dso->adjust_symbols) { pr_debug4("%s: adjusting symbol: st_value: %#Lx " "sh_addr: %#Lx sh_offset: %#Lx\n", __func__, (u64)sym.st_value, (u64)shdr.sh_addr, (u64)shdr.sh_offset); - sym.st_value -= shdr.sh_addr - shdr.sh_offset; + if (saved_shdrs && *saved_shdrs && + sym.st_shndx < *saved_shdrs_count) + sym.st_value -= + (*saved_shdrs)[sym.st_shndx].sh_addr - + (*saved_shdrs)[sym.st_shndx].sh_offset; + else + sym.st_value -= shdr.sh_addr - shdr.sh_offset; } /* * We need to figure out if the object was created from C++ sources @@ -1395,6 +1541,8 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) struct machine *machine; const char *root_dir; int want_symtab; + GElf_Shdr *saved_shdrs = NULL; + unsigned saved_shdrs_count;
dso__set_loaded(self, map->type);
@@ -1425,13 +1573,13 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) * On the first pass, only load images if they have a full symtab. * Failing that, do a second pass where we accept .dynsym also */ - for (self->origin = DSO__ORIG_BUILD_ID_CACHE, want_symtab = 1; - self->origin != DSO__ORIG_NOT_FOUND; - self->origin++) { + self->origin = DSO__ORIG_BUILD_ID_CACHE; + want_symtab = 1; + while (1) { switch (self->origin) { case DSO__ORIG_BUILD_ID_CACHE: if (dso__build_id_filename(self, name, size) == NULL) - continue; + goto continue_next; break; case DSO__ORIG_FEDORA: snprintf(name, size, "/usr/lib/debug%s.debug", @@ -1445,7 +1593,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) char build_id_hex[BUILD_ID_SIZE * 2 + 1];
if (!self->has_build_id) - continue; + goto continue_next;
build_id__sprintf(self->build_id, sizeof(self->build_id), @@ -1469,21 +1617,24 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) default: /* * If we wanted a full symtab but no image had one, - * relax our requirements and repeat the search. + * relax our requirements and repeat the search, + * providing we saw some valid section headers: */ - if (want_symtab) { + if (want_symtab && saved_shdrs != NULL) { want_symtab = 0; self->origin = DSO__ORIG_BUILD_ID_CACHE; - } else continue; + } else + goto done; }
/* Name is now the name of the next image to try */ fd = open(name, O_RDONLY); if (fd < 0) - continue; + goto continue_next;
ret = dso__load_sym(self, map, name, fd, filter, 0, + &saved_shdrs, &saved_shdrs_count, want_symtab); close(fd);
@@ -1492,7 +1643,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) * info!?!? */ if (!ret) - continue; + goto continue_next;
if (ret > 0) { int nr_plt = dso__synthesize_plt_symbols(self, map, filter); @@ -1500,9 +1651,18 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) ret += nr_plt; break; } + +continue_next: + self->origin++; + continue; }
+done: free(name); + + if (saved_shdrs) + free(saved_shdrs); + if (ret < 0 && strstr(self->name, " (deleted)") != NULL) return 0; return ret; @@ -1768,7 +1928,8 @@ static int dso__load_vmlinux(struct dso *self, struct map *map, return -1;
dso__set_loaded(self, map->type); - err = dso__load_sym(self, map, vmlinux, fd, filter, 0, 0); + err = dso__load_sym(self, map, vmlinux, fd, filter, 0, + NULL, NULL, 0); close(fd);
if (err > 0)
Fixed a minor error in v2 where a check was erronueously done twice.
Original commit message follows:
Applies to 2.6.35 (git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6.git perf/core branch)
The new patch loads the ELF section headers from a separate file if necessary, to avoid getting confused by the different section file offsets seen in debug images. Invalid section headers are detected by checking for the presence of non- writable SHT_NOBITS sections, which don't make sense under normal circumstances.
In particular, this allows symbols in ET_EXEC images to get fixed up correctly in the presence of separated debug images.
The image search loop is also tidied up to fix a minor bug which would perform the same image load attempt more than once in some situations.
For non-separated images, the headers should get loaded from the same image as the symbols, in the usual way.
Signed-off-by: Dave Martin dave.martin@linaro.org --- tools/perf/util/symbol.c | 182 +++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 170 insertions(+), 12 deletions(-)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 1a36773..8328b60 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -1,4 +1,5 @@ #define _GNU_SOURCE +#include <assert.h> #include <ctype.h> #include <dirent.h> #include <errno.h> @@ -978,8 +979,106 @@ static size_t elf_addr_to_index(Elf *elf, GElf_Addr addr) return -1; }
+/** + * Read all section headers, copying them into a separate array so they survive + * elf_end. + * + * @elf: the libelf instance to operate on. + * @ehdr: the elf header: this must already have been read with gelf_getehdr(). + * @count: the number of headers read is assigned to *count on successful + * return. count must not be NULL. + * + * Returns a pointer to the allocated headers, which should be deallocated with + * free() when no longer needed. + */ +static GElf_Shdr *elf_get_all_shdrs(Elf *elf, GElf_Ehdr const *ehdr, + unsigned *count) +{ + GElf_Shdr *shdrs; + Elf_Scn *scn; + unsigned max_index = 0; + unsigned i; + + shdrs = malloc(ehdr->e_shnum * sizeof *shdrs); + if (!shdrs) + return NULL; + + for (i = 0; i < ehdr->e_shnum; i++) + shdrs[i].sh_type = SHT_NULL; + + for (scn = NULL; (scn = elf_nextscn(elf, scn)); ) { + size_t j; + + /* + * Just assuming we get section 0, 1, ... in sequence may lead + * to wrong section indices. Check the index explicitly: + */ + j = elf_ndxscn(scn); + assert(j < ehdr->e_shnum); + + if (j > max_index) + max_index = j; + + if (!gelf_getshdr(scn, &shdrs[j])) + goto error; + } + + *count = max_index + 1; + return shdrs; + +error: + free(shdrs); + return NULL; +} + +/** + * Check that the section headers @shdrs reflect accurately the file data + * layout of the image that was loaded during perf record. This is generally + * not true for separated debug images generated with e.g., + * objcopy --only-keep-debug. + * + * We identify invalid files by checking for non-empty sections which are + * declared as having no file data (SHT_NOBITS) but are not writable. + * + * @shdrs: the full set of section headers, as loaded by elf_get_all_shdrs(). + * @count: the number of headers present in @shdrs. + * + * Returns 1 for valid headers, 0 otherwise. + */ +static int elf_check_shdrs_valid(GElf_Shdr const *shdrs, unsigned count) +{ + unsigned i; + + for (i = 0; i < count; i++) { + if (shdrs[i].sh_type == SHT_NOBITS && + !(shdrs[i].sh_flags & SHF_WRITE) && + shdrs[i].sh_size != 0) + return 0; + } + + return 1; +} + +/* + * Notes: + * + * If saved_shdrs is non-NULL, the section headers will be read if found, and + * will be used for address fixups. saved_shdrs_count must also be non-NULL in + * this case. This may be needed for separated debug images, since the section + * headers and symbols may need to come from separate images in that case. + * + * Note: irrespective of whether this function returns successfully, + * *saved_shdrs may get initialised if saved_shdrs is non-NULL. It is the + * caller's responsibility to free() it when non longer needed. + * + * If want_symtab == 1, this function will only load symbols from .symtab + * sections. Otherwise (want_symtab == 0), .dynsym or .symtab symbols are + * loaded. This feature is used by dso__load() to search for the best image + * to load. + */ static int dso__load_sym(struct dso *self, struct map *map, const char *name, int fd, symbol_filter_t filter, int kmodule, + GElf_Shdr **saved_shdrs, unsigned *saved_shdrs_count, int want_symtab) { struct kmap *kmap = self->kernel ? map__kmap(map) : NULL; @@ -998,6 +1097,9 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name, int nr = 0; size_t opdidx = 0;
+ if (saved_shdrs != NULL) + assert(saved_shdrs_count != NULL); + elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL); if (elf == NULL) { pr_debug("%s: cannot read %s ELF file.\n", __func__, name); @@ -1021,6 +1123,34 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name, goto out_elf_end; }
+ /* + * Copy all section headers from the image if requested and if not + * already loaded. + */ + if (saved_shdrs != NULL && *saved_shdrs == NULL) { + GElf_Shdr *shdrs; + unsigned count; + + shdrs = elf_get_all_shdrs(elf, &ehdr, &count); + if (shdrs == NULL) + goto out_elf_end; + + /* + * Only keep the headers if they reflect the actual run-time + * image's file layout: + */ + if (elf_check_shdrs_valid(shdrs, count)) { + *saved_shdrs = shdrs; + *saved_shdrs_count = count; + } else + free(shdrs); + } + + /* If no genuine ELF headers are available yet, give up: we can't + * adjust symbols correctly in that case: */ + if (saved_shdrs != NULL && *saved_shdrs == NULL) + goto out_elf_end; + sec = elf_section_by_name(elf, &ehdr, &shdr, ".symtab", NULL); if (sec == NULL) { if (want_symtab) @@ -1153,12 +1283,25 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name, goto new_symbol; }
+ /* + * Currently, symbols for shared objects and PIE executables + * (i.e., ET_DYN) do not seem to get adjusted. This might need + * to change if file offset == virtual address is not actually + * guaranteed for these images. ELF doesn't provide this + * guarantee natively. + */ if (curr_dso->adjust_symbols) { pr_debug4("%s: adjusting symbol: st_value: %#Lx " "sh_addr: %#Lx sh_offset: %#Lx\n", __func__, (u64)sym.st_value, (u64)shdr.sh_addr, (u64)shdr.sh_offset); - sym.st_value -= shdr.sh_addr - shdr.sh_offset; + if (saved_shdrs && *saved_shdrs && + sym.st_shndx < *saved_shdrs_count) + sym.st_value -= + (*saved_shdrs)[sym.st_shndx].sh_addr - + (*saved_shdrs)[sym.st_shndx].sh_offset; + else + sym.st_value -= shdr.sh_addr - shdr.sh_offset; } /* * We need to figure out if the object was created from C++ sources @@ -1395,6 +1538,8 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) struct machine *machine; const char *root_dir; int want_symtab; + GElf_Shdr *saved_shdrs = NULL; + unsigned saved_shdrs_count;
dso__set_loaded(self, map->type);
@@ -1425,13 +1570,13 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) * On the first pass, only load images if they have a full symtab. * Failing that, do a second pass where we accept .dynsym also */ - for (self->origin = DSO__ORIG_BUILD_ID_CACHE, want_symtab = 1; - self->origin != DSO__ORIG_NOT_FOUND; - self->origin++) { + self->origin = DSO__ORIG_BUILD_ID_CACHE; + want_symtab = 1; + while (1) { switch (self->origin) { case DSO__ORIG_BUILD_ID_CACHE: if (dso__build_id_filename(self, name, size) == NULL) - continue; + goto continue_next; break; case DSO__ORIG_FEDORA: snprintf(name, size, "/usr/lib/debug%s.debug", @@ -1445,7 +1590,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) char build_id_hex[BUILD_ID_SIZE * 2 + 1];
if (!self->has_build_id) - continue; + goto continue_next;
build_id__sprintf(self->build_id, sizeof(self->build_id), @@ -1469,21 +1614,24 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) default: /* * If we wanted a full symtab but no image had one, - * relax our requirements and repeat the search. + * relax our requirements and repeat the search, + * providing we saw some valid section headers: */ - if (want_symtab) { + if (want_symtab && saved_shdrs != NULL) { want_symtab = 0; self->origin = DSO__ORIG_BUILD_ID_CACHE; - } else continue; + } else + goto done; }
/* Name is now the name of the next image to try */ fd = open(name, O_RDONLY); if (fd < 0) - continue; + goto continue_next;
ret = dso__load_sym(self, map, name, fd, filter, 0, + &saved_shdrs, &saved_shdrs_count, want_symtab); close(fd);
@@ -1492,7 +1640,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) * info!?!? */ if (!ret) - continue; + goto continue_next;
if (ret > 0) { int nr_plt = dso__synthesize_plt_symbols(self, map, filter); @@ -1500,9 +1648,18 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter) ret += nr_plt; break; } + +continue_next: + self->origin++; + continue; }
+done: free(name); + + if (saved_shdrs) + free(saved_shdrs); + if (ret < 0 && strstr(self->name, " (deleted)") != NULL) return 0; return ret; @@ -1768,7 +1925,8 @@ static int dso__load_vmlinux(struct dso *self, struct map *map, return -1;
dso__set_loaded(self, map->type); - err = dso__load_sym(self, map, vmlinux, fd, filter, 0, 0); + err = dso__load_sym(self, map, vmlinux, fd, filter, 0, + NULL, NULL, 0); close(fd);
if (err > 0)
Em Tue, Aug 03, 2010 at 12:48:34PM +0100, Dave Martin escreveu:
Hi all,
I've hit some problems in trying to get perf record/report to work correctly on systems with separated debug images (Ubuntu, in particular). I worked on some patches to ensure that separate debug images do actually get loaded, when present -- these commits have now been merged in linux-2.6-tip/master. (See below for a list of the commits.)
Now that these are in place, I'm observing a new problem which can mess up symbol locations -- though I think there might be a practical workaround, I can't see a trivial solution, so I'm keen to hear if anyone has any ideas.
The problem:
perf makes some incorrect assumptions, which mean that the symbol locations seen by perf report and friends can be incorrect.
My analysis:
a) perf represents symbols as offsets from the start of the mmap'd code region which contains each symbol.
The symbol library in perf uses separate map_ip and unmap_ip to cope with those, covering things like prelinked binaries, the kernel with its absolute addresses, kexec cases where we capture a reference relocation symbol so that we can adjust later when using the unrelocated vmlinux, etc.
b) For shared libs (ET_DYN objects), the desired offset is (usually) equal to the VMA of each given
VMA here stands for Virtual Memory Address, not Virtual Memory Area where a map is loaded, right? I guess so, but trying to get to common ground.
symbol in the image. Perf assumes that these offsets are always equal, and this normally works. Atypical link configurations could cause this
Which ones? We need to detect those and act accordingly, doing adjustments at load time or by using a separate map_ip/unmap_ip pair of functions in struct map.
I'll continue reading your patches, the first one probably fixes a bug, but wonder if you looked at the struct map->{map_ip,unmap_ip} routines in tools/perf/util/map.h.
- Arnaldo
A follow-up on this:
On Tue, Aug 3, 2010 at 12:48 PM, Dave Martin dave.martin@linaro.org wrote:
[...]
a) perf represents symbols as offsets from the start of the mmap'd code region which contains each symbol.
I've looked at the code some more, and it looks like I've misunderstood here. It looks like symbol addresses are stored and searched as _file offsets_ in relation to each mapping -- the map__map_ip implementation bears this out, now that I understand how it's used to translate from run-time vitrual address to a file offset:
static inline u64 map__map_ip(struct map *map, u64 ip) { return ip - map->start + map->pgoff; }
Does this conclusion look wrong to anyone?
Assuming I've understood correctly now, this means that the existing code _is_ correct, except that seperated debug images aren't processed correctly.
This means that my patch "work around incorrect ET_EXEC symbol adjustment" is therefore a bodge, not a fix -- it will "work" in many cases, for the reasons previously discussed, but it's probably not the right solution.
Looks like I need to think about this a bit more--- basically we need to have the ELF section or program headers from the each mmap'd file available in order to translate correctly between file offset and link-time virtual address for each image. The information would need to be available either when loading symbols or when doing symbol lookups. As already discussed, the headers from separated debug images are junk (at least the p_offset and sh_offset values are junk) and will lead to wrong address calculations.
We could a) Capture the relevant information during perf record, in addition to capturing the build-ids. This might break the perf data file format, but has the advantage (?) that perf report can display correct image-relative virtual addresses even if the debug symbols can't be loaded. map__map_ip could be converted to output b) Try to read the loadable image _and_ the debug image during perf report. Currently the code only loads one image per mapping. Loading both and cross-referencing information between them would make the loading process more complicated.
For (a) we know the program headers will be correct, since they can be obtained from the exact file that was mapped during the profiling run. For (b) the "real" file may have gone, and we search other locations instead. So, we would need a way to detect which set of program headers are the correct ones when we search for images to load. I'm not sure what the best approach is for this -- maybe some checks for empty PT_LOAD segments (i.e., for which p_filesz == 0, or is much smaller than p_memsz) would work...
Any views welcome on which approach would be best.
Cheers ---Dave